Research Article Volume 6 Issue 3
Institute of Mathematical Science and Computing, University of São Paulo, Brazil
Correspondence: Pedro Luiz Ramos,Institute of Mathematical Science and Computing, University of São Paulo, Brazil
Received: June 28, 2017 | Published: September 14, 2017
Citation: Ramos PL, Nascimento D, Louzada F. The long term fréchet distribution: estimation, properties and its application. Biom Biostat Int J. 2017;6(3):357-362. DOI: 10.15406/bbij.2017.06.00170
In this paper a new long-term survival distribution is proposed. The so called long term Fréchet distribution allows us to fit data where a part of the population is not susceptible to the event of interest. This model may be used, for example, in clinical studies where a portion of the population can be cured during a treatment. It is shown an account of mathematical properties of the new distribution such as its moments and survival properties. As well is presented the maximum likelihood estimators (MLEs) for the parameters. A numerical simulation is carried out in order to verify the performance of the MLEs. Finally, an important application related to the leukemia free-survival times for transplant patients are discussed to illustrates our proposed distribution
Keywords: Fréchet distribution; Long-term survival distribution; Survival model
Extreme value models play an important role in statistic. The generalized extreme value (GEV) distribution [1] and its sub-models are widely used in application involving extreme events. These sub-models are the well known Weibull, Fréchet and Gumbel distributions. The Fréchet distribution can be seen as the inverse Weibull distribution which gives a probability density function (PDF) such as
f(t,λ,α)=αλ(tλ)−(α+1)e−(tλ)−αf(t,λ,α)=αλ(tλ)−(α+1)e−(tλ)−α (1)
The survival function is given by
S(t,λ,α)=1−e−(tλ)−αS(t,λ,α)=1−e−(tλ)−α (2)
Although the GEV distribution is the most used generalization of the Fréchet model, other distributions has been proposed in the literature. De Gusmão [2] proposed a three parameter generalized inverse Weibull distribution in which includes the Fréchet distribution. Krishna et al. [3] proposed the Marshall-Olkin Fréchet distribution. Barreto-Souza et al. [4] discussed some results for beta Fréchet distribution. However, in survival studies habitually the researches may consider a portion of the population as cured during a given treatment, this type of distribution is called long-term (LT) survival models.
In this study, a long-term survival novel proposing a mixture model introduced by Berkson and Gage [5], hereafter we shall call it the long-term Fréchet distribution or simplistically the LF distribution. Some mathematical properties about the LF distribution were provided such as moments, survival properties and hazard function. The maximum likelihood estimators of the parameters and its asymptotic properties are discussed likewise. Similar studies were presented by Roman et al. [6] for the geometric exponential distribution and by Louzada and Ramos [7] for the weighted Lindley distribution. It was performed a numerical simulation towards to examine the performance of the MLEs. Finally, our proposed methodology is illustrated in a real data set related to the leukemia free-survival times (in years) for the 50 autologous transplant patients.
The paper is organized as follows. Section 2 presents the long term Fréchet distribution and its mathematical properties. Section 3 discusses the parameter estimation under the maximum likelihood approach. Section 4 presents a simulation study under different values of the parameters and different levels of censorship. The proposed methodology is also fully illustrated in a real data set. Lastly, Section 6 summarizes the founds in this study and its potential contribution.
Long-term survivors are an important feature to incorporate in the modeling process, since a portion of the population may no longer be eligible to the event of interest (according to Maller and Zhou, [8]; or Perdona and Louzada, [9]). Hence the population can be segregate as a not eligible to the event of interest with probability pp and as eligible (in risk) to the event of interest with probability(1−p)(1−p) . The long-term survivor is expressed as
S(t;p,θ)=p+(1−p)S0(t;θ)S(t;p,θ)=p+(1−p)S0(t;θ) , (3)
wherep∈(0,1)p∈(0,1) andS0(t;θ)S0(t;θ) is the survival function related to the eligible group. The obtained survival function (not conditional) is improper and its limit corresponds to the individual proportion cure. From the survival function one can easily derive the PDF (improper) given by
f(t;p,θ)=−∂∂tS(t;p,θ)=(1−π)f0(t;θ)f(t;p,θ)=−∂∂tS(t;p,θ)=(1−π)f0(t;θ) , (4)
wheref0(t;θ)f0(t;θ) is the PDF related to the susceptible group.
Figure 1: It shows some cases about the PDF and the survival function shapes applied to LF distribution.
In Left panel: Probability density function of the LF distribution. Right panel: Survival function of the LF distribution.
Considering thatf0(t;θ)f0(t;θ) follows a Fréchet distribution, then the PDF of the Long Term Fréchet (LF) distribution is given by
f(t;λ,α,p)=α(1−p)λ(tλ)−(α+1)e−(tλ)−αf(t;λ,α,p)=α(1−p)λ(tλ)−(α+1)e−(tλ)−α , (5)
where λ>0λ>0 , α>0α>0 and p∈(0,1)p∈(0,1) . The cumulative distribution function is given by
F(t;λ,α,p)=(1−p)e−(tλ)−αF(t;λ,α,p)=(1−p)e−(tλ)−α . (6)
In this case, the LF has the quantile function in closed-form and is given by
tu=λlog((1−p)u)−1αtu=λlog((1−p)u)−1α (7)
where0≤u<10≤u<1 . The rr -th moments of T about the origin is
E(Tr;λ,α,p)=(1−p)λrΓ(1−rα) , α>rE(Tr;λ,α,p)=(1−p)λrΓ(1−rα) , α>r , (8)
For r∈ℕr∈N and Γ(x)=∞∫0e−yyx−1dyΓ(x)=∞∫0e−yyx−1dy is called gamma function. Along with some algebraic manipulation the mean and variance of the LF distribution are given, respectively, by
E(T;λ,α,p)=(1−p)λΓ(1−1α) , α>1E(T;λ,α,p)=(1−p)λΓ(1−1α) , α>1
and
V(T;λ,α,p)=(1−p)λ2(Γ(1−2α)−(1−p)Γ(1−1α)2), α>2V(T;λ,α,p)=(1−p)λ2(Γ(1−2α)−(1−p)Γ(1−1α)2), α>2 .
The survival and hazard functions of LF(λ,α,p)LF(λ,α,p) distribution is given by
S(t;λ,α,p)=p+(1−p)(1−e−(tλ)−α)S(t;λ,α,p)=p+(1−p)(1−e−(tλ)−α) (9)
and
h(t;λ,α,p)=αλ(tλ)−(α+1)e−(tλ)−αp+(1−p)(1−exp(−(tλ)−α))h(t;λ,α,p)=αλ(tλ)−(α+1)e−(tλ)−αp+(1−p)(1−exp(−(tλ)−α)) . (10)
For each failure time related to the i-th individual, it may not be perceived or subject by the right censoring. Furthermore, the random censoring times CiCi s are independent of TiTi s (non-censored time) and their distribution does not depend on the parameters. In a scenario of a nn sample of size, the data set will be describe byD=(ti,δi) , where ti=min(Ti,Ci) and δi=I(Ti≤Ci) . This general random censoring scheme has as special case type I and II censoring mechanism. The likelihood function is given by
L(θ;D)=∏ni=1f(ti;θ)δiS(ti;θ)1−δi
LetT1, …,Tn be a random sample of LF distribution, the likelihood function considering data with random censoring is given by
L(λ,α,p;D)=αd(1−p)dλ−dα∏ni=1t−δi(α+1)iexp(−∑ni=1δi(tiλ)−α)×(p+(1−p)(1−e−(tλ)−α))1−δi
where d=∑ni=1δi . The log-likelihood function is given as
l(λ,α,p;D)=dlog(α)+dlog(1−p)+dαlogλ−(α+1)∑ni=1δilogti−∑ni=1δi(λti)α+∑ni=1(1−δi)log(p+(1−p)(1−e−(tλ)−α)). (11)
The maximum likelihood estimators (MLEs) are widely explored as statistical inferential methodology due its many desirable properties, in which includes consistency, asymptotic efficiency and invariance. The MLEs are obtained from the maximization of the log-likelihood function (11). Before we derive the MLEs of the LF, let us define the following function
ηj(λ,α,p;D)=∑ni=1(1−δi)logS(ti;θ)∂θj, j=1,2,3.
Then, the likelihood equations are given by
dαλ−αλα−1∑ni=1δit−αi+η1(λ,α,p;D)=0 ,
dα+dlogλ−∑ni=1δi(λti)αlog(λti)−∑ni=1δilogti+η2(λ,α,p;D)=0 and
dp−1+η3(λ,α,p;D)=0 .
The maximization of the log-likelihood function can be performed directly by using existing statistical packages. Further information about the numerical procedures will be discussed in the next section.
According to Migon et al. [10], under mild conditions the obtained estimators are consistent and efficient with an asymptotically normal joint distribution given by
(ˆλ,ˆα,ˆp)~N3((λ,α,p),H−1(λ,α,p)) ,
whereΗ(λ,α,P) , is the 3×3 observed Fisher information matrix and Hij(λ,α,p) is the Fisher information given by
Hij(θ)=−∂∂θi∂θjl(θ;D), i,j=1,2,3 .
Note that, the observed Fisher information matrix was used since it is not possible to compute the expected Fisher information matrix due its lack of closed form expression. For large samples, confidence intervals approximation can be constructed for the individual parameters θi i=1,2,3, assuming a confidence coefficient 100(1−γ)% the marginal distributions are given by
ˆθi~N(θi,H−1ii(θ)), i=1,2,3.
The maximum likelihood method efficiency was analyzed through a simulation study on the LF distribution. This procedure was conducted by computing the mean relative errors (MRE) and the mean square errors (MSE) given by
MREi=1N∑Nj=1ˆθi,jθi, MSEi=1N∑Nj=1(ˆθi,j−θi)2, for i=1,2,3.
as N is the number of estimates obtained through the MLE approach. The 95% coverage probabilities of the asymptotic confidence intervals were also evaluated. The adopted approach prioritize that the expected MLEs returns the MREs closer to one with smaller MSEs. Additionally, by considering a 95% confidence level, the interval covers the true values of θ closer to 95%.Considering scenarios with sample sizes n=(10,25,50,100, 200) and N=100,000 for the simulation study, two situations are presented by considering the proportion of cure in the population of 0.3 and0.5 . In these cases, the censored proportions are observed in different levels.
In pursuance to find the maximization of the log-likelihood function, described in the equation (11), the package called maxLik available in R developed by Henningsen and Toomet [11] was used. The numerical results are well-behaved since was not found numerical problems using the SANN method (Simulated-annealing), such as failure evidence of convergence or end on multiple maxima. The programs can be obtained, upon request.
The estimates obtained from Tables 1-4 for α, λ and p are asymptotically unbiased, implying that MREs tend to one when n increases and the MSEs decrease to zero for n large. Analyzing the MLEs performance, with a coverage probabilities tending to 0.95, good coverage properties may be deliberated for the parameter estimators. In practical applications, those estimation procedures will be relevant as shown in the next section.
θ |
α=0.5 |
λ=2.0 |
p=0.3 |
0.457 |
||||||
n |
MRE |
MSE |
C95% |
MRE |
MSE |
C95% |
MRE |
MSE |
C95% |
Mp |
25 |
1.265 |
0.060 |
0.948 |
1.100 |
3.344 |
0.810 |
1.100 |
0.021 |
0.925 |
0.461 |
50 |
1.114 |
0.020 |
0.952 |
1.117 |
2.002 |
0.860 |
1.024 |
0.013 |
0.938 |
0.458 |
100 |
1.048 |
0.008 |
0.952 |
1.098 |
1.087 |
0.893 |
0.992 |
0.008 |
0.948 |
0.457 |
200 |
1.022 |
0.004 |
0.953 |
1.059 |
0.488 |
0.919 |
0.991 |
0.004 |
0.952 |
0.457 |
300 |
1.014 |
0.003 |
0.951 |
1.039 |
0.293 |
0.928 |
0.993 |
0.003 |
0.952 |
0.457 |
Table 1: MREs, MSEs, C95% estimates for 100,000 considering n = (25, 50, 100, 200, 300) and 45.7% of censorship.
|
α=0.5 |
λ=2.0 |
p=0.5 |
0.612 |
||||||
n |
MRE |
MSE |
C95% |
MRE |
MSE |
C95% |
MRE |
MSE |
C95% |
Mp |
25 |
1.384 |
0.144 |
0.939 |
1.263 |
8.714 |
0.788 |
1.027 |
0.022 |
0.914 |
0.612 |
50 |
1.158 |
0.033 |
0.940 |
1.238 |
5.967 |
0.838 |
1.001 |
0.014 |
0.928 |
0.612 |
100 |
1.070 |
0.013 |
0.946 |
1.156 |
2.683 |
0.877 |
0.994 |
0.007 |
0.940 |
0.612 |
200 |
1.031 |
0.006 |
0.951 |
1.085 |
0.841 |
0.906 |
0.994 |
0.004 |
0.948 |
0.612 |
300 |
1.020 |
0.004 |
0.951 |
1.056 |
0.459 |
0.921 |
0.996 |
0.002 |
0.951 |
0.612 |
Table 2: MREs, MSEs, C95% estimates for 100,000 considering n = (25, 50, 100, 200, 300) and 61% of censorship.
|
α=2.0 |
λ=4.0 |
p=0.3 |
0.35 |
||||||
n |
MRE |
MSE |
C95% |
MRE |
MSE |
C95% |
MRE |
MSE |
C95% |
Mp |
25 |
1.103 |
0.316 |
0.951 |
1.022 |
0.347 |
0.921 |
0.997 |
0.010 |
0.927 |
0.349 |
50 |
1.047 |
0.116 |
0.951 |
1.011 |
0.155 |
0.938 |
0.998 |
0.005 |
0.937 |
0.348 |
100 |
1.023 |
0.050 |
0.950 |
1.005 |
0.074 |
0.945 |
1.000 |
0.002 |
0.945 |
0.349 |
200 |
1.011 |
0.023 |
0.950 |
1.003 |
0.036 |
0.947 |
1.000 |
0.001 |
0.947 |
0.349 |
300 |
1.008 |
0.015 |
0.951 |
1.002 |
0.024 |
0.947 |
0.999 |
0.001 |
0.947 |
0.348 |
Table 3: MREs, MSEs, C95% estimates for 100,000 considering n = (25, 50, 100, 200, 300) and 35% of censorship.
|
α=2.0 |
λ=4.0 |
p=0.3 |
0.535 |
||||||
n |
MRE |
MSE |
C95% |
MRE |
MSE |
C95% |
MRE |
MSE |
C95% |
Mp |
25 |
1.158 |
0.619 |
0.953 |
1.034 |
0.546 |
0.910 |
0.998 |
0.011 |
0.933 |
0.535 |
50 |
1.068 |
0.182 |
0.953 |
1.016 |
0.230 |
0.933 |
0.999 |
0.006 |
0.942 |
0.535 |
100 |
1.033 |
0.075 |
0.950 |
1.007 |
0.105 |
0.942 |
1.000 |
0.003 |
0.947 |
0.535 |
200 |
1.016 |
0.034 |
0.950 |
1.004 |
0.051 |
0.946 |
0.999 |
0.001 |
0.947 |
0.534 |
300 |
1.011 |
0.022 |
0.949 |
1.002 |
0.034 |
0.948 |
1.000 |
0.001 |
0.948 |
0.535 |
Table 4: MREs, MSEs, C95% estimates for 100,000 considering n = (25, 50, 100, 200, 300) and 53.5% of censorship.
In this section, we considered the data set presented by Kersey et al. [12]. The results were collected in a group of 46 patients, per years, upon the recurrence of leukemia whom received autologous marrow. Table 5 shows the full data set (+ indicates censored observations).
0.0301 |
0.0384 |
0.0630 |
0.0849 |
0.0877 |
0.0959 |
0.1397 |
0.1616 |
0.1699 |
0.2137 |
0.2137 |
0.2164 |
0.2384 |
0.2712 |
0.2740 |
0.3863 |
0.4384 |
0.4548 |
0.5918 |
0.6000 |
0.6438 |
0.6849 |
0.7397 |
0.8575 |
0.9096 |
0.9644 |
1.0082 |
1.2822 |
1.3452 |
1.4000 |
1.5260 |
1.7205+ |
1.9890+ |
2.2438 |
2.5068+ |
2.6466+ |
3.0384 |
3.1726+ |
3.4411 |
4.4219+ |
4.4356+ |
4.5863+ |
4.6904+ |
4.7808+ |
4.9863+ |
5.0000+ |
|
|
Table 5: Leukemia free-survival times (in years) for the 46 autologous transplant patients (where + indicates censored observations).
The proposed model is compared with some usual long-term survival models, such as the LT Weibull and LT weighted Lindley (Louzada and Ramos, [13]). Different discrimination criterion methods are considered: the negative of the maximum value of the likelihood functionl(ˆθ;t) , the Akaike information criterion (AIC=−2l(ˆθ;t)+2k) and the corrected AIC(AIC+2k(k+1)/ (n−k−1)) , where k is the number of parameters to be fitted. The best model is the one which provides the minimum criterion method values.
Figure 2 presents the empirical survival function adjusted by the Kaplan-Meier estimator and different LT survival distributions.
Figure 2: Survival function adjusted by the empirical survival function (Kaplan-Meier estimator), LT Fréchet, LT Weibull and LT WL distribution.
Table 6 presents the results of the different discrimination criteria for different probability distributions. Comparing the results of the different discrimination methods, we observed that the LT Fréchet distribution has better fit then the LT models under the Weibull and weighted Lindley baseline distribution.
Method |
LT Fréchet |
LT Weibull |
LT WL |
−logL |
45.33 |
46.15 |
46.56 |
AIC |
96.66 |
98.30 |
99.12 |
AICc |
97.23 |
98.87 |
99.69 |
Table 6: Represents the results of the different discrimination criteria for different probability distributions.
The MLEs were obtained through the same procedure as described in Section 3. The standard error (SE) and the confidence intervals, considering a 95% confidence level for α, λ and p are displays in Table 7.
θ |
MLE |
SE |
CI95%(θ) |
α |
0.65682 |
0.01975 |
(0.38140;0.93225) |
λ |
0.31358 |
0.01531 |
( 0.07106;0.55609) |
p |
0.12476 |
0.01597 |
(0.00000;0.37245) |
Table 7: MLE, Standard Error (SE), and confidence interval under 95% confidence level for α, λ and p .
Note that, in Kersey et al. [12] they use the non-parametric KM estimate of the cure fraction in which was 0.20 where (0.08;0.32) is the 95% confidence interval. Therefore, results showed to be consistence with Kersey et al. [12] results while our estimate was contained in the non-parametric interval. By using our parametric model the estimate obtained for p was 0.125 showing an overestimation of the long term survival patients. As it can be seen, through our proposed methodology the data related to the leukemia free-survival times (in months) for the 50 autologous transplant patients can be described by the LF distribution.
In this paper, we have proposed a new long-term survival distribution called long term Fréchet distribution and its mathematical properties were studied. It was presented results towards the maximum likelihood parameters’ estimators and their asymptotic properties. The estimators’ efficient were present in the simulation study as the MLEs for the three unknown parameters obtained acceptable results even for small sample sizes. As such of the real dataset problem, related to the leukemia, free-survival times (in months) for the 50 autologous transplant patients. Many extensions from this present work can be considered, for instance, the parameters estimation may also be studied under an objective Bayesian analysis (Ramos et al., [14,15]) or using different classical methods (Louzada et al., Bakouch et al. [16]). Other approach should be to include covariates under the assumption of Cox model, i.e., proportional hazards. In conclusion, this regression model can be extended for the Bayesian approach as well.
The authors are thankful to the Editorial Board and to the reviewers for their valuable comments and suggestions which led to this improved version.
©2017 Ramos, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7