Submit manuscript...
eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 10 Issue 3

Gumbel - Pareto distribution and it’s applications in modeling COVID data

Jeena Joseph,1 KK Jose2

1Department of Statistics, St. Thomas’ College, Thrissur, India
2School of Mathematics and Statistics and Data Analytics, Mahatma Gandhi University, India

Correspondence: KK Jose, School of Mathematics and Statistics and Data Analytics, Mahatma Gandhi University, Kottayam, Kerala, India

Received: September 15, 2021 | Published: September 30, 2021

Citation: Joseph J, Jose KK. Gumbel - Pareto distribution and it’s applications in modeling COVID data. Biom Biostat Int J. 2021;10(3):125-128. DOI: 10.15406/bbij.2021.10.00338

Download PDF

Abstract

A new distribution namely Gumbel- Pareto from Gumbel -X family1 is introduced. Some properties including moments and order statistics are studied. A reliability measure for stress - strength analysis is derived. The method of maximum likelihood is proposed for estimating the distribution parameters.The flexibility of the new model is illustrated using two examples including Covid data.

Keywords: Gumbel distribution, gumbel - X family, gumbel – Pareto, order statistics, pareto distribution, stress - strength reliabilty, T - X family

Introduction

Statistical distributions play an important role in parametric inference and are commonly applied to model real life data. In practical situations, existing standard distributions do not provide good fit to all types of real data sets. Hence statisticians are developing many new distributions which are flexible than standard distributions for the analysis of real data. New distributions are developed either by combining two or more existing distributions or by adding extra parameters to the existing distributions.

The beta generated family of distributions and Kumaraswamy generated family of distributions are generated by using distributions with support between 0 and 1 as the generator. As an extension, Alzaatreh et al.2 proposed a general method by replacing the beta pdf with any non - negative continuous random variable TT as the generator and another function U(F(x))U(F(x)) which satisfies the following conditions:

  1. U(F(x))[a,b]U(F(x))[a,b]
  2. U(F(x))U(F(x)) is differentiable and monotonically non - decreasing.
  3. U(F(x))aU(F(x))a as x and U(F(x))bx and U(F(x))b asxx

 The new class of distributions is defined by

G(x)=U(F(x))ar(t)dt=R[U(F(x))]G(x)=U(F(x))ar(t)dt=R[U(F(x))] (1.1)

 where R(t)R(t) is the cdf and r(t)r(t) is the pdf of the random variable TT . Here, the cdf in (1.1) is a composite function of (R,U,F)(x)(R,U,F)(x) . The corresponding pdf is

g(x)={ddxU(F(x))}r{U(F(x))}g(x)={ddxU(F(x))}r{U(F(x))} (1.2)

The p.d.f.r(t)r(t) in (1.2) is “transformed" into a new pdf g(x)g(x) through the function U(F(x))U(F(x)) , which acts as a “transformer". That is, a random variable XX , “the transformer", is used to transform another random variable TT , “the transformed". The resulting family is known as “Transformed - Transformer" or "TX""TX" family of distributions. A large number of distributions, continuous and discrete, can be generated by applying any two existing univariate distributions based on this method. Alzaatreh et al.2 gave several choices of U(F(x))U(F(x)) depending upon the support of the random variable TT .

When the support of TT is bounded or support of TT is [0,1]: In this case U(F(x))U(F(x)) can be taken as F(x)F(x) or Fα(x)Fα(x) . This leads to the beta - generated family of distributions.

When the support of [0,)[0,) is [0,)[0,) ,a0a0 : Without loss of generality, we assume a=0a=0 . Then U(F(x))U(F(x)) can be defined as log(1F(x))log(1F(x)) ,log(1Fα(x))log(1Fα(x)) ,  and Fα(x)/(1Fα(x))Fα(x)/(1Fα(x)) , where α>0α>0 .

When the support of TT is (,)(,) : Then U(F(x))U(F(x)) can be taken as log[log(1F(x))]log[log(1F(x))] ,log[F(x)/(1F(x))]log[F(x)/(1F(x))] , log[log(1Fα(x))]log[log(1Fα(x))] andlog[Fα(x)/(1Fα(x))].log[Fα(x)/(1Fα(x))].

In this paper, we are considering the third case, that is, the support of TT is (,)(,) . For that, we consider T as the most important extreme value Type I distribution known as Gumbel distribution. This distribution has many applications including, to describe extreme wind spreads, sea wave heights, floods,rainfall during droughts, electrical strength of materials, air pollution problems, geological problems, naval engineering etc. Recently, the Gumbel distribution is used for modelling covid 19 data4,5 also.

 Al-Aqtash1 proposed the Gumbel - X family by taking T as the Gumbel random variable

G(x)=eeμσ(F(x)ˉF(x))1/σG(x)=eeμσ(F(x)¯¯¯F(x))1/σ (1.3)

 By setting λ=eμ/σλ=eμ/σ the cdf reduces to

 

 

G(x)=eλ(F(x)ˉF(x))1/σG(x)=eλ(F(x)¯¯¯F(x))1/σ (1.4)

 and the pdf is

g(x)=λσf(x)(F(x))1σ1(ˉF(x))1σ+1eλ(F(x)ˉF(x))1/σg(x)=λσf(x)(F(x))1σ1(¯¯¯F(x))1σ+1eλ(F(x)¯¯¯F(x))1/σ (1.5)

 The support of the random variable associated with (1.5) and f(.) are the same.

 

The paper is designed as follows. In section 2, we define the Gumbel-Pareto distribution. Some structural properties including moments, quantile function and order statistics are discussed in section 3.The maximum likelihood estimation of the model parameters is discussed in section 4.The application of this distribution to two real data sets are presented in section 5. In section 6, stress - strength analysis is discussed. Finally section 7 offers some concluding remarks.

Gumbel- Pareto distribution

Pareto distribution is a well known distribution for its capability in modeling heavy tailed data sets especially income and wealth data. Kochanczyk and Lipniack6 has conducted a Pareto based evaluation of national responses to Covid - 19.

If the parent distribution is Pareto with parameters k and θθ , with pdf

f(x)=kθ(xθ)k1,x>θf(x)=kθ(xθ)k1,x>θ (2.1)

then the cdf of the four parameter Gumbel - Pareto distribution, denoted by GuP(x;λ,σ,k,θ)GuP(x;λ,σ,k,θ) is given by

GGuP(x;λ,σ,k,θ)=eλ[(xθ)k1]1/σ, x>θ.GGuP(x;λ,σ,k,θ)=eλ[(xθ)k1]1/σ, x>θ. (2.2)

The corresponding pdf is given by

gGuP(x;λ,σ,k,θ)=λkσθeλ[(xθ)k1]1/σ[(xθ)k1]1/σ1(xθ)k1. (2.3)

 

The hazard function (hf) is obtained as

h(x;λ,σ,k,θ)=λkσθeλ[(xθ)k1]1/σ[(xθ)k1]1/σ1(xθ)k11eλ[(xθ)k1]1/σ, (2.4)

 

6cm 6cm

Some structural properties

Transformation

Lemma 3.1: If YGu(μ,σ) then X=θ(eY+1)1/kGu P distribution

The proof is done by using transformation technique.

Quantile function and simulation

The quantile function of Gumbel-Pareto is obtained by inverting (2.2) asx=Q(u)=θ[1+(1λlog u)σ]1/k

 If uU(0,1) , then X=Q(u) has pdf g(x).

By using Q(u), one can obtain the Galton skewness and Moor’s Kurtosis which is defined asS=Q(6/8)2Q(4/8)+Q(2/8)Q(6/8)Q(2/8) K=Q(7/8)Q(5/8)+Q(3/8)Q(1/8)Q(6/8)Q(2/8)

Moments

Theorem 3.1 The K=Q(7/8)Q(5/8)+Q(3/8)Q(1/8)Q(6/8)Q(2/8) raw moment of Gumbel Pareto distribution isμ'r=θri=0λiσ(r/ki)Γ(1iσ)

 where Γ(a)=0ta1etdt is the gamma function.

The skewness and kurtosis can also be calculated from ordinary moments using well-known relationships.

Order Statistics

Order statistics deals with the properties and applications of ordered random samples and their functions. Suppose X1,X2,....Xn be a random sample from Gumbel Pareto distribution. LetXr:n denote the rth order statistic. Then the pdf of Xr:n can be expressed as

gr:n(x)=n!(r1)!(nr)!nrj=0(1)j(nrj)g(x)G(x)j+r1 (3.1)

 Inserting g(x) and G(x) in (3.1)and after some algebra we get,gr:n(x)=nrj=0[(1)jn!(r1)!(nr)!(nrj)λkσθ[(xθ)k1]1/σ1(xθ)k1exp{(r+j)(λ[((xθ)k1]1/σ)}

=nrj=0ξjg(x;λ,σ,k,θ) (3.2)

 where ξj=(1)jn!(r1)!(nr)!(nrj) and g(x;λ,σ,k,θ) is the Gumbel Pareto density function with parameters λ ,σ , k and θ .

It reveals that the pdf of Gumbel Pareto order statistics is the mixture of Gumbel Pareto densities.

Maximum likelihood estimation

The maximum likelihood method is applied for estimating the parameters of Gumbel-Pareto distribution. Let X1,X2,....Xn be a random sample from Gumbel Pareto(GuP) distribution. Also let Θ=(λ,σ,k,θ) The likelihood function for the GuP distribution is given byL(Θ)=(λkσθ)nexp{{λni=1[(xθ)k1]1/σ}ni=1[(xθ)k1]}1/σ1(xθ)ni=1k1}

 The components of the score vector U(Θ) are given by

Uλ=nλni=1[(xiθ)k1]1/σ

Uσ=nσλni=1{[(xiθ)k1]1/σlog[(xiθ)k1]}+1σ2ni=1log[(xiθ)k1]

Uk=nk+λσni=1{[(xiθ)k1]1/σ1(xiθ)klog(xiθ)}+(1σ1)ni=1[(xiθ)klog(xiθ)][(xiθ)k1]+nlog(xiθ)

Uθ=nkθλkσθni=1{[(xiθ)k1]1/σ1(xiθ)k}+kθ(1σ+1)ni=1(xiθ)k[(xiθ)k1]

The parameters can be estimated by equating these nonlinear equations to zero and solving them using the nlm function in R program.

Data analysis

In this section, we illustrate the effectiveness of Gumbel - Pareto distribution and compare the results with other existing models. To compare the distributions, we consider standardized goodness of fit measures like logL(Θ) , AIC (Akaike information criterion), CAIC (Consistent Akaike information criterion), BIC (Bayesian information criterion) and HQIC (Hannan - Quinn information criterion). Smaller these values, better is the fit.

Data set I: Number of deaths due to COVID-19 in China. This data is reported in

(https://www.worldometers. info/coronavirus/country/china/) which represents daily deaths due to COVID-19 in China from 23 January to 28 March.

The data are: 8, 16, 15, 24, 26, 26, 38, 43, 46, 45, 57, 64, 65, 73, 73, 86, 89, 97, 108, 97, 146, 121, 143, 142, 105, 98, 136, 114, 118, 109, 97, 150, 71, 52, 29, 44, 47, 35, 42, 31, 38, 31, 30, 28, 27, 22, 17, 22, 11, 7, 13, 10, 14, 13, 11, 8, 3, 7, 6, 9, 7, 4, 6, 5, 3, 5.

Here we compare the new model with Exponentiated tranform of Gumbel type -II model (ETGT -II), Additive Gumbel type II (AGT -II) model and Gumbel type II model. The values of the statistics are given in Table 1.

Figure 1 The graph of the pdf and hazard rate of Gumbel - Pareto distribution for various parameter values.

 

 

 

 

 

 

 

Distribution

mles

logL

 AIC

 CAIC

 BIC

 HQIC

 

 

 

 

 

 

 

 

λ  =2.527

 

 

 

 

 

GuP

σ  =2.968

 

 

 

 

 

 

 k=0.994

222.428

 452.856

444.856

453.512

444.856

 

θ  =2.879

 

 

 

 

 

 

 

 

 

 

 

 

 

γ  =1.086

 

 

 

 

 

ETGT -II

δ  =10.688

 329.158

664.316

664.703

670.885

666.912

 

ψ  =2.431

 

 

 

 

 

 

 

 

 

 

 

 

 

β  =7.479

 

 

 

 

 

AGT -II

λ =13.432

 

 

 

 

 

 

δ =4.486

331.081

 670.162

670.818

678.921

673.623

 

α  =0.9137

 

 

 

 

 

 

 

 

 

 

 

 

 

β =0.916

 

 

 

 

 

GT -II

α =13.532

331.102

 666.203

666.397

670.583

667.934

Table 1 The mles and the goodness of fit statistics , AIC, CAIC, BIC and HQIC for the data set 1

From the table, we can see that the suggested model is suitable for real life applications.

Data set II: The data set is a real data that consists of the number of successive failure for the air conditioning system reported of each member in a fleet of 13 Boeing 720 jet airplanes. The pooled data with 214 observations was considered by Proschan7, Kus8 and many others. Here we compare the model with existing Weibull Pareto model.

From Table 2, we can see that newly developed Gumbel Pareto distribution is suitable for the given data than the existing Weibull Pareto distribution.2

 

 

 

 

 

 

 

 

Distribution

mles

SE

logL

 AIC

 CAIC

 BIC

 HQIC

 

 

 

 

 

 

 

 

 

λ =9.6166

1.213

 

 

 

 

 

GuP

σ =10.1333

2.281

 

 

 

 

 

 

 k=7.233

1.678

1005.81

 2017.62

 2017.74

 2027.72

 2021.71

 

θ =0.9981

0.0026

 

 

 

 

 

 

 

 

 

 

 

 

 

 

α =9.8626

0.008

 

 

 

 

 

WP

θ =0.9283

0.004

 1459.62

 2925.24

 2925.35

 2935.34

 2929.32

 

b=0.1267

0.00001

 

 

 

 

 

Table 2 The mles and their standard errors (SE) and the goodness of fit statistics , AIC, CAIC, BIC and HQIC for the data set II

Stress - strength analysis

The reliability is defined as the probability of not failing, denoted by R and is defined as R=P(X<Y) where X represents the stress and represents the strength of a component. For the evaluation of , here we assume that both the random variables follow the distributions belonging to the same family and are independent. There are a number of applications in the literature including stress - strength model and breakdown of a system having two components. If X and Y are two independent random variables with cdf F1(x) and F2(y) and pdf f1(x) and f2(y) respectively. Then

R=P(X<Y)=F2(t)f1(t)dt. (6.1)

Lemma 6.1 If X and Y are two independent random variables following Gumbel - X family of distributions with parameters (λ1,σ1) and (λ2,σ2) respectively. Then

 

R=j=0(1)jλ2jj!λ1jσ1σ2Γ(jσ1σ2+1) (6.2)

 A reliability test plan is developed when the life time of the items follow Gumbel - Pareto distribution. See Jeena and Jose9 for more details.10-14

Conclusion

In this paper, we proposed the new Gumbel-Pareto distribution. We study some of its structural properties including moments, quantile functions and order statistics.The estimation of the model parameters is addressed by maximum likelihood method. We fit the new model to two real data sets to demonstrate the usefulness in practice. We conclude that GuP distribution provides consistently better fit than other competing models for the data set. We hope that the proposed model will attract wider applications in various areas such as engineering, survival and lifetime data, hydrology,economics, Biostatistical data on Cancer, Covid etc

References

  1. Al - Aqtash R. On generating a new family of distributions using the logit function. Ph.D. thesis, central michigan university, mount pleasant, michigan. 2013.
  2. Alzaatreh A, Lee C, Famoye F. A new method for generating families of continuous distributions. Metron. 2013a;71(1):63‒79.
  3. Alzaatreh A, Lee C, Famoye F. Weibull pareto distribution and its applications. Communications in statistics - theory and methods. 2013b; 42(9): 1673‒1691.
  4. Yoo K, Arashi M, Bekker A. Pitting the Gumbel and logistic growth models against one another to model COVID-19 spread. medRxiv. 2020.
  5. Sindhu TN, Shafiq A, Al-Mdallal QM. Exponentiated transformation of Gumbel Type-II distribution for modeling COVID-19 data. Alexandria Engineering Journal. 2021;60(1):671‒689.
  6. Kocha Åczyk M, Lipniacki T. Pareto-based evaluation of national responses to COVID-19 pandemic shows that saving lives and protecting economy are non-trade-off objectives. Scientific reports. 2021;11(1):1‒9.
  7. Proschan F. Theoretical explanation of observed decreasing failure rate. Technometrics. 1963;5(3):375‒383.
  8. Kus C. A New lifetime distribution. Computational statistics and data analysis. 2007;51(9):4497‒4509.
  9. Joseph J, Jose KK. Reliability test plan for gumbel–pareto life time model. International Journal of Statistics and Reliability Engineering. 2021;8(1):121‒131.
  10. Beare BK, Toda. On the emergence of a power law in the distribution of COVID-19 cases. Physica D: Nonlinear Phenomena. 2020;412:132649.
  11. EJ Gumbel. Statistical theory of extreme values and some practical applications. Applied Mathematics, 1st edn. vol. 33, U.S. Department of Commerce, National Bureau of Standards, ASIN B0007DSHG4, Gaithersburg, Md, USA. 1954.
  12. Kotz S, Nadarajah S. Extreme value distributions: theory and applications. Imperial College Press, London. 2000.
  13. S Nadarajah. The exponentiated Gumbel distribution with climate application, Environmetrics. 2006;17(1):13‒23.
  14. Wong F, Collins JJ. Evidence that coronavirus superspreading is fat-tailed. Proceedings of the national academy of sciences. 2020;117(47):29416‒29418.
Creative Commons Attribution License

©2021 Joseph, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.