Gumbel - Pareto distribution and it’s applications in modeling COVID data

doi:10.15406/bbij.2021.10.00338

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 10 Issue 3

Gumbel - Pareto distribution and it’s applications in modeling COVID data

Jeena Joseph,¹ KK Jose²

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

¹Department of Statistics, St. Thomas’ College, Thrissur, India
²School of Mathematics and Statistics and Data Analytics, Mahatma Gandhi University, India

Correspondence: KK Jose, School of Mathematics and Statistics and Data Analytics, Mahatma Gandhi University, Kottayam, Kerala, India

Received: September 15, 2021 | Published: September 30, 2021

Citation: Joseph J, Jose KK. Gumbel - Pareto distribution and it’s applications in modeling COVID data. Biom Biostat Int J. 2021;10(3):125-128. DOI: 10.15406/bbij.2021.10.00338

Download PDF

Abstract

A new distribution namely Gumbel- Pareto from Gumbel -X family¹ is introduced. Some properties including moments and order statistics are studied. A reliability measure for stress - strength analysis is derived. The method of maximum likelihood is proposed for estimating the distribution parameters.The flexibility of the new model is illustrated using two examples including Covid data.

Keywords: Gumbel distribution, gumbel - X family, gumbel – Pareto, order statistics, pareto distribution, stress - strength reliabilty, T - X family

Introduction

Statistical distributions play an important role in parametric inference and are commonly applied to model real life data. In practical situations, existing standard distributions do not provide good fit to all types of real data sets. Hence statisticians are developing many new distributions which are flexible than standard distributions for the analysis of real data. New distributions are developed either by combining two or more existing distributions or by adding extra parameters to the existing distributions.

The beta generated family of distributions and Kumaraswamy generated family of distributions are generated by using distributions with support between 0 and 1 as the generator. As an extension, Alzaatreh et al.² proposed a general method by replacing the beta pdf with any non - negative continuous random variable $T$ as the generator and another function $U (F (x))$ which satisfies the following conditions:

$U (F (x)) \in [a, b]$
$U (F (x))$ is differentiable and monotonically non - decreasing.
$U (F (x)) \to a$ as $x \to - \infty and U (F (x)) \to b$ as $x \to \infty$

The new class of distributions is defined by

$G (x) = \int_{a}^{U (F (x))} r (t) d t = R [U (F (x))]$ (1.1)

where $R (t)$ is the cdf and $r (t)$ is the pdf of the random variable $T$ . Here, the cdf in (1.1) is a composite function of $(R, U, F) (x)$ . The corresponding pdf is

$g (x) = {\frac{d}{d x} U (F (x))} r {U (F (x))}$ (1.2)

The p.d.f. $r (t)$ in (1.2) is “transformed" into a new pdf $g (x)$ through the function $U (F (x))$ , which acts as a “transformer". That is, a random variable $X$ , “the transformer", is used to transform another random variable $T$ , “the transformed". The resulting family is known as “Transformed - Transformer" or $" T - X "$ family of distributions. A large number of distributions, continuous and discrete, can be generated by applying any two existing univariate distributions based on this method. Alzaatreh et al.² gave several choices of $U (F (x))$ depending upon the support of the random variable $T$ .

When the support of $T$ is bounded or support of $T$ is [0,1]: In this case $U (F (x))$ can be taken as $F (x)$ or $F^{α} (x)$ . This leads to the beta - generated family of distributions.

When the support of $[0, \infty)$ is $[0, \infty)$ , $a \geq 0$ : Without loss of generality, we assume $a = 0$ . Then $U (F (x))$ can be defined as $- l o g (1 - F (x))$ , $- l o g (1 - F^{α} (x))$ , and $F^{α} (x) / (1 - F^{α} (x))$ , where $α > 0$ .

When the support of $T$ is $(- \infty, \infty)$ : Then $U (F (x))$ can be taken as $l o g [- l o g (1 - F (x))]$ , $l o g [F (x) / (1 - F (x))]$ , $l o g [- l o g (1 - F^{α} (x))]$ and $l o g [F^{α} (x) / (1 - F^{α} (x))] .$

In this paper, we are considering the third case, that is, the support of $T$ is $(- \infty, \infty)$ . For that, we consider T as the most important extreme value Type I distribution known as Gumbel distribution. This distribution has many applications including, to describe extreme wind spreads, sea wave heights, floods,rainfall during droughts, electrical strength of materials, air pollution problems, geological problems, naval engineering etc. Recently, the Gumbel distribution is used for modelling covid 19 data^4,5 also.

Al-Aqtash¹ proposed the Gumbel - X family by taking T as the Gumbel random variable

$G (x) = e^{- e^{\frac{μ}{σ}} {(\frac{F (x)}{\bar{F} (x)})}^{- 1 / σ}}$ (1.3)

By setting $λ = e^{μ / σ}$ the cdf reduces to

$G (x) = e^{- λ {(\frac{F (x)}{\bar{F} (x)})}^{- 1 / σ}}$ (1.4)

and the pdf is

$g (x) = \frac{λ}{σ} f (x) \frac{{(F (x))}^{- \frac{1}{σ} - 1}}{{(\bar{F} (x))}^{- \frac{1}{σ} + 1}} e^{- λ {(\frac{F (x)}{\bar{F} (x)})}^{- 1 / σ}}$ (1.5)

The support of the random variable associated with (1.5) and f(.) are the same.

The paper is designed as follows. In section 2, we define the Gumbel-Pareto distribution. Some structural properties including moments, quantile function and order statistics are discussed in section 3.The maximum likelihood estimation of the model parameters is discussed in section 4.The application of this distribution to two real data sets are presented in section 5. In section 6, stress - strength analysis is discussed. Finally section 7 offers some concluding remarks.

Gumbel- Pareto distribution

Pareto distribution is a well known distribution for its capability in modeling heavy tailed data sets especially income and wealth data. Kochanczyk and Lipniack⁶ has conducted a Pareto based evaluation of national responses to Covid - 19.

If the parent distribution is Pareto with parameters k and $θ$ , with pdf

$f (x) = \frac{k}{θ} {(\frac{x}{θ})}^{- k - 1}, x > θ$ (2.1)

then the cdf of the four parameter Gumbel - Pareto distribution, denoted by $G u P (x; λ, σ, k, θ)$ is given by

$G_{G u P} (x; λ, σ, k, θ) = e^{- λ {[{(\frac{x}{θ})}^{k} - 1]}^{- 1 / σ}}, x > θ .$ (2.2)

The corresponding pdf is given by

$g_{G u P} (x; λ, σ, k, θ) = \frac{λ k}{σ θ} e^{- λ {[{(\frac{x}{θ})}^{k} - 1]}^{- 1 / σ}} {[{(\frac{x}{θ})}^{k} - 1]}^{- 1 / σ - 1} {(\frac{x}{θ})}^{^{^{k - 1}}} .$ (2.3)

The hazard function (hf) is obtained as

$h (x; λ, σ, k, θ) = \frac{\frac{λ k}{σ θ} e^{- λ {[{(\frac{x}{θ})}^{k} - 1]}^{- 1 / σ}} {[{(\frac{x}{θ})}^{k} - 1]}^{- 1 / σ - 1} {(\frac{x}{θ})}^{^{^{k - 1}}}}{1 - e^{- λ {[{(\frac{x}{θ})}^{k} - 1]}^{- 1 / σ}}},$ (2.4)

6cm 6cm

Some structural properties

Transformation

Lemma 3.1: If $Y \sim G u (μ, σ) t h e n X = θ {(e^{Y} + 1)}^{1 / k} \sim G u P d i s t r i b u t i o n$

The proof is done by using transformation technique.

Quantile function and simulation

The quantile function of Gumbel-Pareto is obtained by inverting (2.2) as $x = Q (u) = θ {[1 + {(- \frac{1}{λ} l o g u)}^{- σ}]}^{1 / k}$

If $u \sim U (0, 1)$ , then X=Q(u) has pdf g(x).

By using Q(u), one can obtain the Galton skewness and Moor’s Kurtosis which is defined as $S = \frac{Q (6 / 8) - 2 Q (4 / 8) + Q (2 / 8)}{Q (6 / 8) - Q (2 / 8)}$ $K = \frac{Q (7 / 8) - Q (5 / 8) + Q (3 / 8) - Q (1 / 8)}{Q (6 / 8) - Q (2 / 8)}$

Moments

Theorem 3.1 The $K = \frac{Q (7 / 8) - Q (5 / 8) + Q (3 / 8) - Q (1 / 8)}{Q (6 / 8) - Q (2 / 8)}$ raw moment of Gumbel Pareto distribution is $μ_{r}^{'} = θ^{r} \sum_{i = 0}^{\infty} λ^{i σ} (\begin{array}{l} r / k \\ i \end{array}) Γ (1 - i σ)$

where $Γ (a) = \int_{0}^{\infty} t^{a - 1} e^{- t} d t$ is the gamma function.

The skewness and kurtosis can also be calculated from ordinary moments using well-known relationships.

Order Statistics

Order statistics deals with the properties and applications of ordered random samples and their functions. Suppose $X_{1}, X_{2}, .... X_{n}$ be a random sample from Gumbel Pareto distribution. Let $X_{r : n}$ denote the $r^{t h}$ order statistic. Then the pdf of $X_{r : n}$ can be expressed as

$g_{r : n} (x) = \frac{n!}{(r - 1)! (n - r)!} {\sum_{j = 0}^{n - r} (- 1)}^{j} (\begin{array}{l} n - r \\ j \end{array}) g (x) G {(x)}^{j + r - 1}$ (3.1)

Inserting $g (x)$ and $G (x)$ in (3.1)and after some algebra we get, $g_{r : n} (x) = \sum_{j = 0}^{n - r} {[\frac{{(- 1)}^{j} n!}{(r - 1)! (n - r)!} (\begin{array}{l} n - r \\ j \end{array}) \frac{λ k}{σ θ} [{(\frac{x}{θ})}^{k} - 1]}^{- 1 / σ - 1} {(\frac{x}{θ})}^{k - 1} e x p {- (r + j) (λ [({(\frac{x}{θ})}^{k} - 1]^{- 1 / σ})}$

$= \sum_{j = 0}^{n - r} ξ_{j} g (x; λ, σ, k, θ)$ (3.2)

where $ξ_{j} = \frac{{(- 1)}^{j} n!}{(r - 1)! (n - r)!} (\begin{array}{l} n - r \\ j \end{array})$ and $g (x; λ, σ, k, θ)$ is the Gumbel Pareto density function with parameters $λ$ , $σ$ , $k$ and $θ$ .

It reveals that the pdf of Gumbel Pareto order statistics is the mixture of Gumbel Pareto densities.

Maximum likelihood estimation

The maximum likelihood method is applied for estimating the parameters of Gumbel-Pareto distribution. Let $X_{1}, X_{2}, .... X_{n}$ be a random sample from Gumbel Pareto(GuP) distribution. Also let $Θ = (λ, σ, k, θ)$ The likelihood function for the GuP distribution is given by $L (Θ) = {(\frac{λ k}{σ θ})}^{n} e x p {{- λ \sum_{i = 1}^{n} [{{(\frac{x}{θ})}^{k} - 1]}^{- 1 / σ}} \prod_{i = 1}^{n} [{(\frac{x}{θ})}^{k} - 1]}^{- 1 / σ - 1} {}_{i = 1}^{n}{(\frac{x}{θ})}^{k - 1}}$

The components of the score vector $U (Θ)$ are given by

$U_{λ} = \frac{n}{λ} - \sum_{i = 1}^{n} {[{(\frac{x_{i}}{θ})}^{k} - 1]}^{- 1 / σ}$

$U_{σ} = - \frac{n}{σ} - λ \sum_{i = 1}^{n} {{[{(\frac{x_{i}}{θ})}^{k} - 1]}^{- 1 / σ} \log [{(\frac{x_{i}}{θ})}^{k} - 1]} + \frac{1}{σ^{2}} \sum_{i = 1}^{n} \log [{(\frac{x_{i}}{θ})}^{k} - 1]$

$U_{k} = \frac{n}{k} + \frac{λ}{σ} \sum_{i = 1}^{n} {{[{(\frac{x_{i}}{θ})}^{k} - 1]}^{- 1 / σ - 1} {(\frac{x_{i}}{θ})}^{k} l o g (\frac{x_{i}}{θ})} + (- \frac{1}{σ} - 1) \sum_{i = 1}^{n} \frac{[{(\frac{x_{i}}{θ})}^{k} l o g (\frac{x_{i}}{θ})]}{[{(\frac{x_{i}}{θ})}^{k} - 1]} + n l o g (\frac{x_{i}}{θ})$

$U_{θ} = - \frac{n k}{θ} - \frac{λ k}{σ θ} \sum_{i = 1}^{n} {{[{(\frac{x_{i}}{θ})}^{k} - 1]}^{- 1 / σ - 1} {(\frac{x_{i}}{θ})}^{k}} + \frac{k}{θ} (\frac{1}{σ} + 1) \sum_{i = 1}^{n} \frac{{(\frac{x_{i}}{θ})}^{k}}{[{(\frac{x_{i}}{θ})}^{k} - 1]}$

The parameters can be estimated by equating these nonlinear equations to zero and solving them using the nlm function in R program.

Data analysis

In this section, we illustrate the effectiveness of Gumbel - Pareto distribution and compare the results with other existing models. To compare the distributions, we consider standardized goodness of fit measures like $- l o g L (Θ)$ , AIC (Akaike information criterion), CAIC (Consistent Akaike information criterion), BIC (Bayesian information criterion) and HQIC (Hannan - Quinn information criterion). Smaller these values, better is the fit.

Data set I: Number of deaths due to COVID-19 in China. This data is reported in

(https://www.worldometers. info/coronavirus/country/china/) which represents daily deaths due to COVID-19 in China from 23 January to 28 March.

The data are: 8, 16, 15, 24, 26, 26, 38, 43, 46, 45, 57, 64, 65, 73, 73, 86, 89, 97, 108, 97, 146, 121, 143, 142, 105, 98, 136, 114, 118, 109, 97, 150, 71, 52, 29, 44, 47, 35, 42, 31, 38, 31, 30, 28, 27, 22, 17, 22, 11, 7, 13, 10, 14, 13, 11, 8, 3, 7, 6, 9, 7, 4, 6, 5, 3, 5.

Here we compare the new model with Exponentiated tranform of Gumbel type -II model (ETGT -II), Additive Gumbel type II (AGT -II) model and Gumbel type II model. The values of the statistics are given in Table 1.

Figure 1 The graph of the pdf and hazard rate of Gumbel - Pareto distribution for various parameter values.


Distribution	mles	$- l o g L$	AIC	CAIC	BIC	HQIC

	$λ$ =2.527
GuP	$σ$ =2.968
	k=0.994	222.428	452.856	444.856	453.512	444.856
	$θ$ =2.879

	$γ$ =1.086
ETGT -II	$δ$ =10.688	329.158	664.316	664.703	670.885	666.912
	$ψ$ =2.431

	$β$ =7.479
AGT -II	$λ$ =13.432
	$δ$ =4.486	331.081	670.162	670.818	678.921	673.623
	$α$ =0.9137

	$β$ =0.916
GT -II	$α$ =13.532	331.102	666.203	666.397	670.583	667.934

Table 1 The mles and the goodness of fit statistics , AIC, CAIC, BIC and HQIC for the data set 1

From the table, we can see that the suggested model is suitable for real life applications.

Data set II: The data set is a real data that consists of the number of successive failure for the air conditioning system reported of each member in a fleet of 13 Boeing 720 jet airplanes. The pooled data with 214 observations was considered by Proschan⁷, Kus⁸ and many others. Here we compare the model with existing Weibull Pareto model.

From Table 2, we can see that newly developed Gumbel Pareto distribution is suitable for the given data than the existing Weibull Pareto distribution.²


Distribution	mles	SE	$- l o g L$	AIC	CAIC	BIC	HQIC

	$λ$ =9.6166	1.213
GuP	$σ$ =10.1333	2.281
	k=7.233	1.678	1005.81	2017.62	2017.74	2027.72	2021.71
	$θ$ =0.9981	0.0026

	$α$ =9.8626	0.008
WP	$θ$ =0.9283	0.004	1459.62	2925.24	2925.35	2935.34	2929.32
	b=0.1267	0.00001

Table 2 The mles and their standard errors (SE) and the goodness of fit statistics , AIC, CAIC, BIC and HQIC for the data set II

Stress - strength analysis

The reliability is defined as the probability of not failing, denoted by $R$ and is defined as $R = P (X < Y)$ where $X$ represents the stress and represents the strength of a component. For the evaluation of , here we assume that both the random variables follow the distributions belonging to the same family and are independent. There are a number of applications in the literature including stress - strength model and breakdown of a system having two components. If $X$ and $Y$ are two independent random variables with cdf $F_{1} (x)$ and $F_{2} (y)$ and pdf $f_{1} (x)$ and $f_{2} (y)$ respectively. Then

$R = P (X < Y) = \int_{- \infty}^{\infty} F_{2} (t) f_{1} (t) d t .$ (6.1)

Lemma 6.1 If X and Y are two independent random variables following Gumbel - X family of distributions with parameters $(λ_{1}, σ_{1})$ and $(λ_{2}, σ_{2})$ respectively. Then

$R = \sum_{j = 0}^{\infty} \frac{{(- 1)}^{j} λ_{2}^{j}}{j! λ_{1}^{\frac{j σ_{1}}{σ_{2}}}} Γ (j \frac{σ_{1}}{σ_{2}} + 1)$ (6.2)

A reliability test plan is developed when the life time of the items follow Gumbel - Pareto distribution. See Jeena and Jose⁹ for more details.^10-14

Conclusion

In this paper, we proposed the new Gumbel-Pareto distribution. We study some of its structural properties including moments, quantile functions and order statistics.The estimation of the model parameters is addressed by maximum likelihood method. We fit the new model to two real data sets to demonstrate the usefulness in practice. We conclude that GuP distribution provides consistently better fit than other competing models for the data set. We hope that the proposed model will attract wider applications in various areas such as engineering, survival and lifetime data, hydrology,economics, Biostatistical data on Cancer, Covid etc