Research Article Volume 10 Issue 3
Gumbel - Pareto distribution and it’s applications in
modeling COVID data
Jeena Joseph,1 KK Jose2
Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.
1Department of Statistics, St. Thomas’ College, Thrissur, India
2School of Mathematics and Statistics and Data Analytics, Mahatma Gandhi University, India
Correspondence: KK Jose, School of Mathematics and Statistics and Data Analytics, Mahatma Gandhi University, Kottayam, Kerala, India
Received: September 15, 2021 | Published: September 30, 2021
Citation: Joseph J, Jose KK. Gumbel - Pareto distribution and it’s applications in modeling COVID data. Biom Biostat Int J. 2021;10(3):125-128. DOI: 10.15406/bbij.2021.10.00338
Download PDF
Abstract
A new distribution namely Gumbel- Pareto from Gumbel -X family1 is introduced. Some properties including moments and order statistics are studied. A reliability measure for stress - strength analysis is derived. The method of maximum likelihood is proposed for estimating the distribution parameters.The flexibility of the new model is illustrated using two examples including Covid data.
Keywords: Gumbel distribution, gumbel - X family, gumbel – Pareto, order statistics, pareto distribution, stress - strength reliabilty, T - X family
Introduction
Statistical distributions play an important role in parametric inference and are commonly applied to model real life data. In practical situations, existing standard distributions do not provide good fit to all types of real data sets. Hence statisticians are developing many new distributions which are flexible than standard distributions for the analysis of real data. New distributions are developed either by combining two or more existing distributions or by adding extra parameters to the existing distributions.
The beta generated family of distributions and Kumaraswamy generated family of distributions are generated by using distributions with support between 0 and 1 as the generator. As an extension, Alzaatreh et al.2 proposed a general method by replacing the beta pdf with any non - negative continuous random variable
as the generator and another function
which satisfies the following conditions:
-
-
is differentiable and monotonically non - decreasing.
-
as
as
The new class of distributions is defined by
(1.1)
where
is the cdf and
is the pdf of the random variable
. Here, the cdf in (1.1) is a composite function of
. The corresponding pdf is
(1.2)
The p.d.f.
in (1.2) is “transformed" into a new pdf
through the function
, which acts as a “transformer". That is, a random variable
, “the transformer", is used to transform another random variable
, “the transformed". The resulting family is known as “Transformed - Transformer" or
family of distributions. A large number of distributions, continuous and discrete, can be generated by applying any two existing univariate distributions based on this method. Alzaatreh et al.2 gave several choices of
depending upon the support of the random variable
.
When the support of
is bounded or support of
is [0,1]: In this case
can be taken as
or
. This leads to the beta - generated family of distributions.
When the support of
is
,
: Without loss of generality, we assume
. Then
can be defined as
,
, and
, where
.
When the support of
is
: Then
can be taken as
,
,
and
In this paper, we are considering the third case, that is, the support of
is
. For that, we consider T as the most important extreme value Type I distribution known as Gumbel distribution. This distribution has many applications including, to describe extreme wind spreads, sea wave heights, floods,rainfall during droughts, electrical strength of materials, air pollution problems, geological problems, naval engineering etc. Recently, the Gumbel distribution is used for modelling covid 19 data4,5 also.
Al-Aqtash1 proposed the Gumbel - X family by taking T as the Gumbel random variable
(1.3)
By setting
the cdf reduces to
(1.4)
and the pdf is
(1.5)
The support of the random variable associated with (1.5) and f(.) are the same.
The paper is designed as follows. In section 2, we define the Gumbel-Pareto distribution. Some structural properties including moments, quantile function and order statistics are discussed in section 3.The maximum likelihood estimation of the model parameters is discussed in section 4.The application of this distribution to two real data sets are presented in section 5. In section 6, stress - strength analysis is discussed. Finally section 7 offers some concluding remarks.
Gumbel- Pareto distribution
Pareto distribution is a well known distribution for its capability in modeling heavy tailed data sets especially income and wealth data. Kochanczyk and Lipniack6 has conducted a Pareto based evaluation of national responses to Covid - 19.
If the parent distribution is Pareto with parameters k and
, with pdf
(2.1)
then the cdf of the four parameter Gumbel - Pareto distribution, denoted by
is given by
(2.2)
The corresponding pdf is given by
(2.3)
The hazard function (hf) is obtained as
(2.4)
6cm 6cm
Some structural properties
Transformation
Lemma 3.1: If
The proof is done by using transformation technique.
Quantile function and simulation
The quantile function of Gumbel-Pareto is obtained by inverting (2.2) as
If
, then X=Q(u) has pdf g(x).
By using Q(u), one can obtain the Galton skewness and Moor’s Kurtosis which is defined as
Moments
Theorem 3.1 The
raw moment of Gumbel Pareto distribution is
where
is the gamma function.
The skewness and kurtosis can also be calculated from ordinary moments using well-known relationships.
Order Statistics
Order statistics deals with the properties and applications of ordered random samples and their functions. Suppose
be a random sample from Gumbel Pareto distribution. Let
denote the
order statistic. Then the pdf of
can be expressed as
(3.1)
Inserting
and
in (3.1)and after some algebra we get,
(3.2)
where
and
is the Gumbel Pareto density function with parameters
,
,
and
.
It reveals that the pdf of Gumbel Pareto order statistics is the mixture of Gumbel Pareto densities.
Maximum likelihood estimation
The maximum likelihood method is applied for estimating the parameters of Gumbel-Pareto distribution. Let
be a random sample from Gumbel Pareto(GuP) distribution. Also let
The likelihood function for the GuP distribution is given by
The components of the score vector
are given by
The parameters can be estimated by equating these nonlinear equations to zero and solving them using the nlm function in R program.
Data analysis
In this section, we illustrate the effectiveness of Gumbel - Pareto distribution and compare the results with other existing models. To compare the distributions, we consider standardized goodness of fit measures like
, AIC (Akaike information criterion), CAIC (Consistent Akaike information criterion), BIC (Bayesian information criterion) and HQIC (Hannan - Quinn information criterion). Smaller these values, better is the fit.
Data set I: Number of deaths due to COVID-19 in China. This data is reported in
(https://www.worldometers. info/coronavirus/country/china/) which represents daily deaths due to COVID-19 in China from 23 January to 28 March.
The data are: 8, 16, 15, 24, 26, 26, 38, 43, 46, 45, 57, 64, 65, 73, 73, 86, 89, 97, 108, 97, 146, 121, 143, 142, 105, 98, 136, 114, 118, 109, 97, 150, 71, 52, 29, 44, 47, 35, 42, 31, 38, 31, 30, 28, 27, 22, 17, 22, 11, 7, 13, 10, 14, 13, 11, 8, 3, 7, 6, 9, 7, 4, 6, 5, 3, 5.
Here we compare the new model with Exponentiated tranform of Gumbel type -II model (ETGT -II), Additive Gumbel type II (AGT -II) model and Gumbel type II model. The values of the statistics are given in Table 1.
Figure 1 The graph of the pdf and hazard rate of Gumbel - Pareto distribution for various parameter values.
|
|
|
|
|
|
|
Distribution |
mles |
|
AIC |
CAIC |
BIC |
HQIC |
|
|
|
|
|
|
|
|
=2.527 |
|
|
|
|
|
GuP |
=2.968 |
|
|
|
|
|
|
k=0.994 |
222.428 |
452.856 |
444.856 |
453.512 |
444.856 |
|
=2.879 |
|
|
|
|
|
|
|
|
|
|
|
|
|
=1.086 |
|
|
|
|
|
ETGT -II |
=10.688 |
329.158 |
664.316 |
664.703 |
670.885 |
666.912 |
|
=2.431 |
|
|
|
|
|
|
|
|
|
|
|
|
|
=7.479 |
|
|
|
|
|
AGT -II |
=13.432 |
|
|
|
|
|
|
=4.486 |
331.081 |
670.162 |
670.818 |
678.921 |
673.623 |
|
=0.9137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
=0.916 |
|
|
|
|
|
GT -II |
=13.532 |
331.102 |
666.203 |
666.397 |
670.583 |
667.934 |
Table 1 The mles and the goodness of fit statistics , AIC, CAIC, BIC and HQIC for the data set 1
From the table, we can see that the suggested model is suitable for real life applications.
Data set II: The data set is a real data that consists of the number of successive failure for the air conditioning system reported of each member in a fleet of 13 Boeing 720 jet airplanes. The pooled data with 214 observations was considered by Proschan7, Kus8 and many others. Here we compare the model with existing Weibull Pareto model.
From Table 2, we can see that newly developed Gumbel Pareto distribution is suitable for the given data than the existing Weibull Pareto distribution.2
|
|
|
|
|
|
|
|
Distribution |
mles |
SE |
|
AIC |
CAIC |
BIC |
HQIC |
|
|
|
|
|
|
|
|
|
=9.6166 |
1.213 |
|
|
|
|
|
GuP |
=10.1333 |
2.281 |
|
|
|
|
|
|
k=7.233 |
1.678 |
1005.81 |
2017.62 |
2017.74 |
2027.72 |
2021.71 |
|
=0.9981 |
0.0026 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
=9.8626 |
0.008 |
|
|
|
|
|
WP |
=0.9283 |
0.004 |
1459.62 |
2925.24 |
2925.35 |
2935.34 |
2929.32 |
|
b=0.1267 |
0.00001 |
|
|
|
|
|
Table 2 The mles and their standard errors (SE) and the goodness of fit statistics , AIC, CAIC, BIC and HQIC for the data set II
Stress - strength analysis
The reliability is defined as the probability of not failing, denoted by
and is defined as
where
represents the stress and represents the strength of a component. For the evaluation of , here we assume that both the random variables follow the distributions belonging to the same family and are independent. There are a number of applications in the literature including stress - strength model and breakdown of a system having two components. If
and
are two independent random variables with cdf
and
and pdf
and
respectively. Then
(6.1)
Lemma 6.1 If X and Y are two independent random variables following Gumbel - X family of distributions with parameters
and
respectively. Then
(6.2)
A reliability test plan is developed when the life time of the items follow Gumbel - Pareto distribution. See Jeena and Jose9 for more details.10-14
Conclusion
In this paper, we proposed the new Gumbel-Pareto distribution. We study some of its structural properties including moments, quantile functions and order statistics.The estimation of the model parameters is addressed by maximum likelihood method. We fit the new model to two real data sets to demonstrate the usefulness in practice. We conclude that GuP distribution provides consistently better fit than other competing models for the data set. We hope that the proposed model will attract wider applications in various areas such as engineering, survival and lifetime data, hydrology,economics, Biostatistical data on Cancer, Covid etc
References
- Al - Aqtash R. On generating a new family of distributions using the logit function. Ph.D. thesis, central michigan university, mount pleasant, michigan. 2013.
- Alzaatreh A, Lee C, Famoye F. A new method for generating families of continuous distributions. Metron. 2013a;71(1):63‒79.
- Alzaatreh A, Lee C, Famoye F. Weibull pareto distribution and its applications. Communications in statistics - theory and methods. 2013b; 42(9): 1673‒1691.
- Yoo K, Arashi M, Bekker A. Pitting the Gumbel and logistic growth models against one another to model COVID-19 spread. medRxiv. 2020.
- Sindhu TN, Shafiq A, Al-Mdallal QM. Exponentiated transformation of Gumbel Type-II distribution for modeling COVID-19 data. Alexandria Engineering Journal. 2021;60(1):671‒689.
- Kocha Åczyk M, Lipniacki T. Pareto-based evaluation of national responses to COVID-19 pandemic shows that saving lives and protecting economy are non-trade-off objectives. Scientific reports. 2021;11(1):1‒9.
- Proschan F. Theoretical explanation of observed decreasing failure rate. Technometrics. 1963;5(3):375‒383.
- Kus C. A New lifetime distribution. Computational statistics and data analysis. 2007;51(9):4497‒4509.
- Joseph J, Jose KK. Reliability test plan for gumbel–pareto life time model. International Journal of Statistics and Reliability Engineering. 2021;8(1):121‒131.
- Beare BK, Toda. On the emergence of a power law in the distribution of COVID-19 cases. Physica D: Nonlinear Phenomena. 2020;412:132649.
- EJ Gumbel. Statistical theory of extreme values and some practical applications. Applied Mathematics, 1st edn. vol. 33, U.S. Department of Commerce, National Bureau of Standards, ASIN B0007DSHG4, Gaithersburg, Md, USA. 1954.
- Kotz S, Nadarajah S. Extreme value distributions: theory and applications. Imperial College Press, London. 2000.
- S Nadarajah. The exponentiated Gumbel distribution with climate application, Environmetrics. 2006;17(1):13‒23.
- Wong F, Collins JJ. Evidence that coronavirus superspreading is fat-tailed. Proceedings of the national academy of sciences. 2020;117(47):29416‒29418.
©2021 Joseph, et al. This is an open access article distributed under the terms of the,
which
permits unrestricted use, distribution, and build upon your work non-commercially.