Research Article Volume 7 Issue 3
Stability within family of Pareto models
Mariam Zahid,1
Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.
Shakila Bashir2
1Department of Statistics, Kinnaird College for Women, Pakistan
2Department of Statistics, Forman Christian College, Pakistan
Correspondence: Mariam Zahid, Department of Statistics, Kinnaird College for Women, Near Canal bank Jail Road, Lahore, Pakistan
Received: April 16, 2018 | Published: May 9, 2018
Citation: Zahid M, Bashir S. Stability within family of Pareto models. Biom Biostat Int J. 2018;7(3):177–186. DOI: 10.15406/bbij.2018.07.00207
Download PDF
Abstract
Transmutation technique is applied to extend the workability and flexibility of weighted Pareto distribution. A weighted probability distribution improves precision for predictability and transmuting the same produces a better model for data fitting. Various statistical properties including moments and quantiles, reliability analysis, mean deviation, order statistics and record values of transmuted weighted Pareto (TWP) distribution are studied. The parameters of the distribution are evaluated using Maximum Likelihood Estimation (MLE). Application study compares different Pareto models to reveal the stability between them. A simulation analysis is also performed.
Keywords: weighted Pareto distribution, moments, reliability analysis, record values, MLE
Introduction
The goal of a probability distribution is to fit maximum data sets with utmost precision and minimum variance so that their behavior can be modeled and predicted. The existing distributions can be made more precise and useful for a wider spectrum of values by applying new techniques such as transmutation.
Pareto distribution was developed by Vilfredo Pareto. This distribution was meant to show the apportionment of income to the population. However, the purpose of the distribution is not restricted to income evaluations only; it uses the precept which requires the data to have a small to large proportion for example, the meteor showers on Earth. Fisher1 introduced the concept of weighted distributions by highlighting the idea of need for surety of occurrence of event. Zelen2 studied weighted distributions and delineated the methodology required to formulate a weighted distribution using cell kinetics. A weighted distribution is found using
(1)
Where . To derive transmuted weighted Pareto distribution, the weighted form used has k=1. Weighted distribution introduces the surety of occurrence of an event and hence can be considered a vital requisite for all probability distributions.
Transmuting a distribution means that a distribution is elaborated by adding more variables in effort to optimize its adaptability towards data. The Quadratic Rank Transmutation (QRT) technique uses an established formula to derive the new distribution.3 Transmutation, thus, can be carried out using the following relations:
(2)
(3)
Equation (2) and (3) give the formula that can be used to derive the pdf and cdf for the transmuted distribution. g(x) represents the pdf and G(x) is the cdf of the parent distribution and . For , the transmuted model changes back to the parent model.
Transmutation technique has been proven very successful in bringing out some useful distributions. Dar et al.4 studied the transmuted weighted exponential distribution. Shahzad et al.5 transmuted the Singh-Maddla distribution. Nasser et al.6 found useful results for transmuted Weibull Logistic distribution. Ashour & Eltehiwy7 studied the transmuted exponentiated Lomax distribution. Aryal & Tsokos8 found that transmuting a distribution helps in advancement of a distribution. Aryal9 found that new generalizations of distributions help in extending the study. Khan et al.10 derived transmuted Kumaraswamy distribution and concluded that for statistical significance of model adequacy the transmuted distribution lead to a better fit than Kumaraswamy distribution. Transmutation studies have gained attention because of their ability to generate new flexible distributions that can help fit data with more precision.
In this paper weighted Pareto distribution is transmuted. Various statistical properties of the new distribution are studied including moments, quantiles, moment generating function (MGF), Bonferroni and Lorenz curves, reliability analysis, order statistics, record values and parameter estimation. Application study compares different Pareto models to see if there is stability between them.
Transmuted weighted pareto distribution
Mir & Ahmad11 derived the weighted Pareto distribution among some other weighted distributions. The pdf and cdf of the weighted Pareto distribution are given below:
(4)
(5)
Here, α is the scale parameter and β is the shape parameter such that α>0 and β>1. These are the main results for the weighted Pareto distribution that will subsequently be transmuted to form a new distribution called the transmuted weighted Pareto distribution (TWP). Equations (2), (4) and (5) are used to find the pdf of the transmuted weighted Pareto distribution:
(6)
Where, . The graphs of the pdf of transmuted weighted Pareto distribution (TWP) are plotted to show the shape of the distribution (Figure 1).
Figure 1 Graphs of pdf of TWP distribution with different values of α, β and λ.
The graph in Figure 1 (left) uses different values for λ with constant α and β values to show how the new variable impacts the shape of the distribution. Values used for λ are -1, -0.5, 0, 0.5 because. On the other hand, graph in Figure 1 (right) shows different combinations of all the variables involved to see the changes brought by each variable. Equations (3), (4) and (5) are used to find the cdf for the transmuted weighted Pareto distribution:
(7)
where and β>1. The cdf is graphically presented in Figure 2.
Figure 2 Graph of cdf of TWP distribution with different α, β and λ values, it shows the cdf with different λ values with constant α and β values (left) and different combinations of α, β and λ values (right).
Moments and other derived measures
This section looks deeper into the distribution by exploring its moments and other properties.
Moment generating function
The moment generating function of the transmuted weighted Pareto distribution is given by:
(8)
Moments
Moments are defining characteristics of a distribution; the moments for TWP distribution are presented in this section. The rth moment of the TWP distribution, with a random variable X, will be:
(9)
Result in eq. (9) could be used to derive moments by putting in different values for r.
(10)
(11)
(12)
(13)
First moment about origin is the mean of the distribution. Likewise, other higher moments can be used to find the variance, skewness, kurtosis etc.
Variance and coefficient of variation of the TWP distribution are given below:
(14)
(15)
Skewness and Kurtosis of the TWP distribution reveal information about symmetry and tail of the distribution respectively.
(16)
where,
(17)
where,
Quantiles
The qth quantile of the TWP distribution is found to be
(18)
The median of the TWP can thus be found by putting q=0.5
(19)
To make skewness and kurtosis yield better results, these measures could be derived using quantiles. The original skewness and kurtosis show infinite measure for heavy-tailed distributions making it less informative. The Bowleys skewness derived as the earliest skewness measure uses average of quartiles minus the median, divided by one half interquartile ranges.
For kurtosis, Moors kurtosis uses octiles to make it a better measure.
Bonferroni and Lorenz curves
Bonferroni and Lorenz curves are important for reliability studies. The curves for the function, F(x)=p, are given by:
, , where ,
Hence, the Bonferroni and Lorenz curves for TWP respectively are
(20)
(21)
Mean deviation and averages
Mean deviation gives the mean of the absolute deviations from its mean value. Thus, the mean deviation of TWP distribution is calculated as
(22)
Harmonic Mean of TWP:
(23)
Geometric mean of TWP:
(24)
Reliability analysis
The reliability analysis involves the evaluation of various processes that assess the quality of life of the data for a time (t).
Reliability function
The reliability function for the TWP distribution tells about the length of life up to a time (t) and thus is an important characteristic to study.
(25)
Hazard rate
The hazard rate tells about the rate of failure of an item. It predicts the end of its life and can be calculated as follows:
(26)
Figure 3 shows the reliability function and hazard rate function graphically. The graph uses different combinations of α, β and λ values to show a decreasing trend of the function (Figure 3).
Figure 3 Reliability and Hazard rate function graphs with different α, β and λ values.
Reversed hazard rate function
Reversed hazard rate comes handy when the time is measured in a reversed manner; therefore, it is tabulated to cover for an occurrence of that sort.
(27)
Cumulative hazard rate
Cumulative hazard rate combines all risks that were faced up to a time, t, and this accumulation is referred to as cumulative hazard rate.
(28)
Order statistics
It is important to study the range of the probability distribution and to serve the need for range; minimum and maximum pdf’s for the TWP distribution are derived. Most commonly, the pdf for the jth order statistic is used, and is thus derived below
(29)
Order statistics, useful for reliability studies, provide the 1st and nth order pdf for TWP distribution
(30)
(31)
The joint pdf for X(i) and X(j) is also found for the TWP distribution
(32)
Random number generation and parameter estimation
Random number generation
The inversion method is used to generate random numbers for the TWP distribution
Here, . After calculation, result for x is
(33)
Equation (33) can be used to get random numbers when the parameters α, β and λ are known.
Method of moments
One of the techniques for parameter estimation is to use method of moments. This process uses moments of the distribution to estimate parameters. Since there are three parameters, there will be three equations:
(34)
(35)
(36)
Equations (34), (35) and (36) express the parameters; they can be further solved simultaneously to get cleaner expressions for the parameters.
Maximum likelihood estimation
Widely used technique for evaluating the parameters of the distribution is that of Maximum Likelihood Estimation technique. If is a random sample of size n from the TWP distribution then its log-likelihood function will be:
Using LL=ln L, log-likelihood function is derived
(37)
To estimate the parameters, equation (37) is differentiated with respect to β and λ and put equal to zero so as to get the respective parameters.
(38)
(39)
For TWP distribution, α is the lower limit for this distribution so the maximum likelihood estimate of α will be the first statistic value i.e., . The log-likelihood function is numerically maximised by using the R software.
Simulation
Simulation can help in understanding the data sets for the particular distribution. Inverse CDF technique is used to simulate the data set in R. The values used for parameters are α=1, β=6 and λ=-0.2. A data set of size 100 is thus simulated for the TWP distribution (Table 1).
Data generated for α=1, β=3 and λ=-0.2 |
2.171301 |
2.469904 |
3.627952 |
4.122543 |
2.290153 |
1.961311 |
2.02465 |
2.788962 |
1.912685 |
2.237079 |
1.993091 |
2.611573 |
2.926888 |
2.658368 |
2.705542 |
2.050194 |
2.19487 |
2.333488 |
2.010353 |
2.462991 |
4.240425 |
2.054078 |
2.150241 |
2.836084 |
3.139823 |
3.276202 |
3.642714 |
1.914709 |
2.589984 |
1.980117 |
2.079828 |
2.081706 |
1.925856 |
1.989657 |
1.95235 |
3.188533 |
3.776208 |
2.161587 |
3.04785 |
2.287527 |
1.908391 |
1.909436 |
2.133601 |
1.945302 |
2.128175 |
2.208188 |
2.512261 |
2.831646 |
1.927698 |
2.205562 |
2.107547 |
2.102916 |
2.061781 |
2.129319 |
1.963237 |
2.142851 |
1.952253 |
2.75405 |
2.385384 |
2.056045 |
2.490634 |
1.987443 |
2.111349 |
2.256456 |
3.406278 |
2.197887 |
1.921594 |
3.08898 |
2.598769 |
2.323328 |
2.2008 |
2.32725 |
3.405701 |
2.567741 |
2.850449 |
1.931747 |
1.956453 |
2.343114 |
2.214165 |
2.446729 |
2.566559 |
1.917651 |
2.910597 |
2.54337 |
2.346076 |
1.959292 |
1.939467 |
2.359965 |
2.056514 |
2.491718 |
3.358692 |
2.762751 |
2.22607 |
2.009319 |
3.075688 |
2.161866 |
1.906318 |
2.630087 |
2.90743 |
3.515726 |
Table 1 Results from the simulation study of the TWP distribution
The data given above is used to estimate parameters for the distribution. Estimated values are given in Table 2.
Model |
Estimates |
Standard Error |
Transmuted Weighted Pareto |
|
0.681109
0.241365 |
Table 2 Estimated values of parameters for TWP distribution
The variance covariance matrix for the TWP distribution with the above data will be:
This shows that the variance of MLE of β and λ are Var= and Var = respectively.
Record values
Record values show in a systematic way the arrangement of the random variable. Bashir & Ahmad12 characterise a weighted Pareto distribution based on its upper record values. Therefore, as it is an important area to study the record values for TWP distribution are also studied to reveal information about sequence of random variables.
Where, and f(x) is the pdf of the TWP distribution.
(40)
Solving the above equation for mean and variance may lead to more complex expressions. Therefore, numerical values of α, β and λ (estimated parameters from the simulation study) are used to find the mean and variance of upper record values (Table 3).
Parameters |
n |
Mean |
Variance |
α = 1.9, |
2 |
0.718 |
2.095 |
β = 5.7, |
3 |
0.914 |
3.723 |
λ = -0.1 |
4 |
1.162 |
6.593 |
|
5 |
1.477 |
11.652 |
Table 3 Mean and Variance of the upper record values of TWP distribution
The joint pdf of and is
(41)
where r(x) is the hazard rate function. The conditional pdf of is
(42)
For
(43)
Application
Two real life examples are used to get results for the TWP distribution. The data of remission times, in months, of people with Bladder cancer as recorded by Lee & Wang13 is used for this application. The data is given in Table 4. Since weighted distributions can be length-biased and area-biased, comparison is conducted for transmuted versions of both of these types of weighted distributions. Transmuted length-biased Pareto (TLbP) is the one studied throughout the length of this study (referred to as TWP) and transmuted area biased Pareto (TAbP) is derived using k=2 in equation 1 and then transmuted in the same manner as TWP. Henceforth, the parameters are evaluated for TWP, TAbP, Transmuted Pareto (TP), Weighted Pareto (WP) and Pareto (P) distributions. The results for the estimates are given in (Table 5).
Remission times of Bladder Cancer patients |
0.08 |
2.09 |
3.48 |
4.87 |
6.94 |
8.66 |
13.11 |
23.63 |
0.2 |
2.23 |
3.52 |
4.98 |
6.97 |
9.02 |
13.29 |
0.4 |
2.26 |
3.57 |
5.06 |
7.09 |
9.22 |
13.8 |
25.74 |
0.5 |
2.46 |
3.64 |
5.09 |
7.26 |
9.47 |
14.24 |
25.82 |
0.51 |
2.54 |
3.7 |
5.17 |
7.28 |
9.74 |
14.76 |
26.31 |
0.81 |
2.62 |
3.82 |
5.32 |
7.32 |
10.06 |
14.77 |
32.15 |
2.64 |
3.88 |
5.32 |
7.39 |
10.34 |
14.83 |
34.26 |
0.9 |
2.69 |
4.18 |
5.34 |
7.59 |
10.66 |
15.96 |
36.66 |
1.05 |
2.69 |
4.23 |
5.41 |
7.62 |
10.75 |
16.62 |
43.01 |
1.19 |
2.75 |
4.26 |
5.41 |
7.63 |
17.12 |
46.12 |
1.26 |
2.83 |
4.33 |
7.66 |
11.25 |
17.14 |
79.05 |
1.35 |
2.87 |
5.62 |
7.87 |
11.64 |
17.36 |
1.4 |
3.02 |
4.34 |
5.71 |
7.93 |
11.79 |
18.1 |
1.46 |
4.4 |
5.85 |
8.26 |
11.98 |
19.13 |
1.76 |
3.25 |
4.5 |
6.25 |
8.37 |
12.02 |
2.02 |
3.31 |
4.51 |
6.54 |
8.53 |
12.03 |
20.28 |
2.02 |
3.36 |
6.76 |
12.07 |
21.73 |
2.07 |
3.36 |
6.93 |
8.65 |
12.63 |
22.69 |
5.49 |
Table 4 Data of remission of Bladder Cancer patients as recorded by Lee & Wang13
Model |
Estimates |
-2log lik |
AIC |
AICC |
BIC |
TWP |
|
1005.038 |
1011.038 |
1011.231 |
1014.742 |
TAbP |
|
1005.038 |
1011.038 |
1011.231 |
1014.742 |
TP |
|
1005.038 |
1011.038 |
1011.231 |
1014.742 |
WP |
|
1077.046 |
1081.046 |
1081.142 |
1081.898 |
P |
|
1077.046 |
1081.046 |
1081.142 |
1081.898 |
Table 5 Estimated value of parameters for different distributions
Table 5 shows a difference in estimates of all the distributions but -2 log likelihood, AIC, AICc and BIC are the same i.e., 1005.038, 1011.038, 1011.231 and 1014.742 respectively for TWP, TAbP and TP distributions. Also for WP and P distributions the -2 log liklihood, AIC, AICc and BIC are same i.e., 1077.046, 1081.046, 1081.142 and 1081.898 respectively. This may mean that TWP, TAbP and TP distributions respond to data in a similar way and WP and P distributions respond in a similar manner with TWP, TabP and TP being better than the others because of the lower -2 log likelihood, AIC, AICc and BIC values. This behavior, thus, exhibits the stability between family of Pareto models. It should also be borne in mind that these values are correct to three decimal places and may show differences if higher points are taken into consideration.
The TWP distribution takes the surety of occurrence into account by incorporating the weighted aspect into it and is further transmuted by adding a variable to make it more flexible. TWP, thus, presents a comprehensive model that accounts for a transmuted version of a weighted distribution.
The variance-covariance matrix of the MLE of the TWP distribution is as following. Variances of MLE of β and λ.
Var= 0.0004906221 and Var = 0.0005991575.
The confidence intervals are
The second data used is that of the survival times of patients who got better after chemotherapy treatment as reported by Bekker et al.14 The data is given below Table 6, the data is used to get estimated parameters for different Pareto distributions.
Survival times after chemotherapy treatment |
0.047 |
0.115 |
0.121 |
0.132 |
0.164 |
0.197 |
0.203 |
0.26 |
0.282 |
0.296 |
0.334 |
0.395 |
0.458 |
0.466 |
0.501 |
0.507 |
0.529 |
0.534 |
0.54 |
0.641 |
0.644 |
0.696 |
0.841 |
0.863 |
1.099 |
1.219 |
1.271 |
1.326 |
1.447 |
1.485 |
1.553 |
1.581 |
1.589 |
2.178 |
2.343 |
2.416 |
2.444 |
2.825 |
2.83 |
3.578 |
3.658 |
3.743 |
3.978 |
4.003 |
4.033 |
Table 6 Data of survival times (years) from chemotherapy treatment
Table 7 shows that TWP, TabP and TP have lower -2 log likelihood, AIC, AICc and BIC values as compared to WP and P distribution which means they are better amongst others. The similarity in results is indicated in this application also which again draws attention to the stability between Pareto models.
Model |
Estimates |
-2log lik |
AIC |
AICC |
BIC |
TWP |
|
145.484 |
151.484 |
152.069 |
153.097 |
TAbP |
|
145.484 |
151.484 |
152.069 |
153.097 |
TP |
|
145.484 |
151.484 |
152.069 |
153.097 |
WP |
|
163.461 |
169.461 |
170.046 |
167.268 |
P |
|
163.461 |
169.461 |
170.046 |
167.268 |
Table 7 Estimated value of parameters for different distributions
The variance-covariance matrix of the MLE of the TWP distribution is as following. Variances of MLE of β and λ.
Var = 0.0032758564 and Var = 0.0061652295.
The confidence intervals are
Conclusion
Weighted Pareto distribution is used in this study to derive a new distribution. The parent distribution is extended using a QRT map in order to transmute it. The resulting transmuted weighted Pareto distribution is then studied. Statistical properties of the distribution are elaborated. Moments, quantiles, MGF, reliability measures, order statistics and record values are derived. Two applications are used to study the parameters of TWP distribution. The comparison of TWP distribution with TAbP, TP, WP and P distributions reveals it to be a better model than Pareto and Weighted Pareto distributions but shows stability when compared with transmuted area-biased Pareto and transmuted Pareto distributions. It can be concluded that since TWP deals with the weighted Pareto distribution, TWP can be considered theoretically more advanced than others in the family of Pareto models.
Acknowledgement
Conflict of interest
Authors declare that there is no conflict of interest.
References
- Fisher RA. The Effects of Methods of Ascertainment upon the Estimation of Frequencies. Ann. Eugenics. 1934;(6):13–25.
- Zelen M. Problems in Cell Kinetics and the Early Detection of Disease, in Reliability and Biometry. SIAM Philadelphia. 1974;6(5):701–706.
- Shaw W, Buckley I. The Alchemy of Probability Distributions: Beyond Gram–Charlier Expansions and a Skew–Kurtotic– Normal Distribution. Research Report; 2007.
- Dar AA, Ahmed A, Reshi JA. Transmuted Weighted Exponential Distribution and Its Application. Journal of Statistics Applications & Probability. 2017;6(1):219–232.
- Shahzad MN, Merovci F, Asghar Z. Transmuted Singh–Maddala Distribution: A New Flexible and Upside–Down Bathtub Shaped Hazard Function Distribution. Revista Colombiana de Estadísti. 2017;40(1):1–27.
- Nassar MM, Radwan SS, Elmasry A. Transmuted Weibull Logistic Distribution. International Journal of Innovative Research & Development. 2017;6(4):122–131.
- Ashour SK, Eltehiwy MA. Transmuted Lomax Distribution. American Journal of Applied Mathematics and Statistics. 2013;1(6):121−127.
- Aryal GR, Tsokos CP. Transmuted Weibull distribution: A Generalization of the Weibull Probability Distribution. European Journal of Pure and Applied Mathematics. 2011;4(2):89−102.
- Aryal GR. Transmuted Log–Logistic Distribution. Journal of Statistics Applications & Probability. 2013;2(1):11−20.
- Khan MS, King R, Hudson IL. Transmuted Kumaraswamy Distribution. Statistics in Trnsition. 2016;17(2):183–210.
- Mir KA, Ahmad M. Size–Biased Distributions and Their Applications. Pakistan Journal of Statistic. 2009;25:283–294.
- Bashir S, Ahmad M. Record Values from Size–Biased Pareto Distribution and a Characterization. International Journal of Engineering Research and General Science. 2004;2(4):101–109.
- Lee ET, Wang JW. Statistical Methods for Survival Data Analysis. 3rd ed. New York: Wiley; 2003.
- Bekker A, Roux J, Mostert P. A generalization of the compound Rayleigh distribution: using a Bayesian methods on cancer survival times. Commun Stat Theory Methods. 2007;29(7):1419–1433.
©2018 Zahid, et al. This is an open access article distributed under the terms of the,
which
permits unrestricted use, distribution, and build upon your work non-commercially.