Submit manuscript...
eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 6 Issue 3

The long term fréchet distribution: estimation, properties and its application

Pedro Luiz Ramos, Diego Nascimento, Francisco Louzada

Institute of Mathematical Science and Computing, University of São Paulo, Brazil

Correspondence: Pedro Luiz Ramos,Institute of Mathematical Science and Computing, University of São Paulo, Brazil

Received: June 28, 2017 | Published: September 14, 2017

Citation: Ramos PL, Nascimento D, Louzada F. The long term fréchet distribution: estimation, properties and its application. Biom Biostat Int J. 2017;6(3):357-362. DOI: 10.15406/bbij.2017.06.00170

Download PDF

Abstract

In this paper a new long-term survival distribution is proposed. The so called long term Fréchet distribution allows us to fit data where a part of the population is not susceptible to the event of interest. This model may be used, for example, in clinical studies where a portion of the population can be cured during a treatment. It is shown an account of mathematical properties of the new distribution such as its moments and survival properties. As well is presented the maximum likelihood estimators (MLEs) for the parameters. A numerical simulation is carried out in order to verify the performance of the MLEs. Finally, an important application related to the leukemia free-survival times for transplant patients are discussed to illustrates our proposed distribution

Keywords: Fréchet distribution; Long-term survival distribution; Survival model

Introduction

Extreme value models play an important role in statistic. The generalized extreme value (GEV) distribution [1] and its sub-models are widely used in application involving extreme events. These sub-models are the well known Weibull, Fréchet and Gumbel distributions. The Fréchet distribution can be seen as the inverse Weibull distribution which gives a probability density function (PDF) such as

f(t,λ,α)=αλ(tλ)(α+1)e(tλ)αf(t,λ,α)=αλ(tλ)(α+1)e(tλ)α (1)

The survival function is given by

S(t,λ,α)=1e(tλ)αS(t,λ,α)=1e(tλ)α (2)

Although the GEV distribution is the most used generalization of the Fréchet model, other distributions has been proposed in the literature. De Gusmão [2] proposed a three parameter generalized inverse Weibull distribution in which includes the Fréchet distribution. Krishna et al. [3] proposed the Marshall-Olkin Fréchet distribution. Barreto-Souza et al. [4] discussed some results for beta Fréchet distribution. However, in survival studies habitually the researches may consider a portion of the population as cured during a given treatment, this type of distribution is called long-term (LT) survival models.

In this study, a long-term survival novel proposing a mixture model introduced by Berkson and Gage [5], hereafter we shall call it the long-term Fréchet distribution or simplistically the LF distribution. Some mathematical properties about the LF distribution were provided such as moments, survival properties and hazard function. The maximum likelihood estimators of the parameters and its asymptotic properties are discussed likewise. Similar studies were presented by Roman et al. [6] for the geometric exponential distribution and by Louzada and Ramos [7] for the weighted Lindley distribution. It was performed a numerical simulation towards to examine the performance of the MLEs. Finally, our proposed methodology is illustrated in a real data set related to the leukemia free-survival times (in years) for the 50 autologous transplant patients.

The paper is organized as follows. Section 2 presents the long term Fréchet distribution and its mathematical properties. Section 3 discusses the parameter estimation under the maximum likelihood approach. Section 4 presents a simulation study under different values of the parameters and different levels of censorship. The proposed methodology is also fully illustrated in a real data set. Lastly, Section 6 summarizes the founds in this study and its potential contribution.

Long Term Fréchet distribution

Long-term survivors are an important feature to incorporate in the modeling process, since a portion of the population may no longer be eligible to the event of interest (according to Maller and Zhou, [8]; or Perdona and Louzada, [9]). Hence the population can be segregate as a not eligible to the event of interest with probability pp and as eligible (in risk) to the event of interest with probability(1p)(1p) . The long-term survivor is expressed as

S(t;p,θ)=p+(1p)S0(t;θ)S(t;p,θ)=p+(1p)S0(t;θ) ,  (3)

wherep(0,1)p(0,1) andS0(t;θ)S0(t;θ) is the survival function related to the eligible group. The obtained survival function (not conditional) is improper and its limit corresponds to the individual proportion cure. From the survival function one can easily derive the PDF (improper) given by

f(t;p,θ)=tS(t;p,θ)=(1π)f0(t;θ)f(t;p,θ)=tS(t;p,θ)=(1π)f0(t;θ) ,  (4)

wheref0(t;θ)f0(t;θ) is the PDF related to the susceptible group.

Figure 1: It shows some cases about the PDF and the survival function shapes applied to LF distribution.
In Left panel: Probability density function of the LF distribution. Right panel: Survival function of the LF distribution.

Considering thatf0(t;θ)f0(t;θ) follows a Fréchet distribution, then the PDF of the Long Term Fréchet (LF) distribution is given by

f(t;λ,α,p)=α(1p)λ(tλ)(α+1)e(tλ)αf(t;λ,α,p)=α(1p)λ(tλ)(α+1)e(tλ)α ,  (5)

where λ>0λ>0 , α>0α>0 and p(0,1)p(0,1) . The cumulative distribution function is given by

F(t;λ,α,p)=(1p)e(tλ)αF(t;λ,α,p)=(1p)e(tλ)α .  (6)

In this case, the LF has the quantile function in closed-form and is given by

tu=λlog((1p)u)1αtu=λlog((1p)u)1α    (7)

where0u<10u<1 . The rr  -th moments of T about the origin is

E(Tr;λ,α,p)=(1p)λrΓ(1rα) , α>rE(Tr;λ,α,p)=(1p)λrΓ(1rα) , α>r , (8)

For rrN and Γ(x)=0eyyx1dyΓ(x)=0eyyx1dy is called gamma function. Along with some algebraic manipulation the mean and variance of the LF distribution are given, respectively, by

E(T;λ,α,p)=(1p)λΓ(11α) ,  α>1E(T;λ,α,p)=(1p)λΓ(11α) ,  α>1

and

V(T;λ,α,p)=(1p)λ2(Γ(12α)(1p)Γ(11α)2),   α>2V(T;λ,α,p)=(1p)λ2(Γ(12α)(1p)Γ(11α)2),   α>2 .

The survival and hazard functions of LF(λ,α,p)LF(λ,α,p) distribution is given by

S(t;λ,α,p)=p+(1p)(1e(tλ)α)S(t;λ,α,p)=p+(1p)(1e(tλ)α)    (9)

and

h(t;λ,α,p)=αλ(tλ)(α+1)e(tλ)αp+(1p)(1exp((tλ)α))h(t;λ,α,p)=αλ(tλ)(α+1)e(tλ)αp+(1p)(1exp((tλ)α)) . (10)

Parameter Estimation

For each failure time related to the i-th individual, it may not be perceived or subject by the right censoring. Furthermore, the random censoring times CiCi s are independent of TiTi s (non-censored time) and their distribution does not depend on the parameters. In a scenario of a nn  sample of size, the data set will be describe byD=(ti,δi) , where ti=min(Ti,Ci) and δi=I(TiCi) . This general random censoring scheme has as special case type I and II censoring mechanism. The likelihood function is given by

L(θ;D)=ni=1f(ti;θ)δiS(ti;θ)1δi

LetT1, ,Tn be a random sample of LF distribution, the likelihood function considering data with random censoring is given by

L(λ,α,p;D)=αd(1p)dλdαni=1tδi(α+1)iexp(ni=1δi(tiλ)α)×(p+(1p)(1e(tλ)α))1δi

where d=ni=1δi . The log-likelihood function is given as

l(λ,α,p;D)=dlog(α)+dlog(1p)+dαlogλ(α+1)ni=1δilogtini=1δi(λti)α+ni=1(1δi)log(p+(1p)(1e(tλ)α)). (11)

The maximum likelihood estimators (MLEs) are widely explored as statistical inferential methodology due its many desirable properties, in which includes consistency, asymptotic efficiency and invariance. The MLEs are obtained from the maximization of the log-likelihood function (11). Before we derive the MLEs of the LF, let us define the following function

ηj(λ,α,p;D)=ni=1(1δi)logS(ti;θ)θj,  j=1,2,3.

Then, the likelihood equations are given by

dαλαλα1ni=1δitαi+η1(λ,α,p;D)=0 ,

dα+dlogλni=1δi(λti)αlog(λti)ni=1δilogti+η2(λ,α,p;D)=0 and

dp1+η3(λ,α,p;D)=0 .

The maximization of the log-likelihood function can be performed directly by using existing statistical packages. Further information about the numerical procedures will be discussed in the next section.

According to Migon et al. [10], under mild conditions the obtained estimators are consistent and efficient with an asymptotically normal joint distribution given by

(ˆλ,ˆα,ˆp)~N3((λ,α,p),H1(λ,α,p)) ,

whereΗ(λ,α,P) , is the 3×3 observed Fisher information matrix and Hij(λ,α,p) is the Fisher information given by

Hij(θ)=θiθjl(θ;D),   i,j=1,2,3 .

Note that, the observed Fisher information matrix was used since it is not possible to compute the expected Fisher information matrix due its lack of closed form expression. For large samples, confidence intervals approximation can be constructed for the individual parameters θi i=1,2,3, assuming a confidence coefficient 100(1γ)% the marginal distributions are given by

ˆθi~N(θi,H1ii(θ)),  i=1,2,3.

Simulation Study

The maximum likelihood method efficiency was analyzed through a simulation study on the LF distribution. This procedure was conducted by computing the mean relative errors (MRE) and the mean square errors (MSE) given by

MREi=1NNj=1ˆθi,jθi,    MSEi=1NNj=1(ˆθi,jθi)2,  for i=1,2,3. 

as N is the number of estimates obtained through the MLE approach. The 95% coverage probabilities of the asymptotic confidence intervals were also evaluated. The adopted approach prioritize that the expected MLEs returns the MREs closer to one with smaller MSEs. Additionally, by considering a 95% confidence level, the interval covers the true values of θ closer to 95%.Considering scenarios with sample sizes n=(10,25,50,100, 200) and N=100,000 for the simulation study, two situations are presented by considering the proportion of cure in the population of 0.3 and0.5 . In these cases, the censored proportions are observed in different levels.

In pursuance to find the maximization of the log-likelihood function, described in the equation (11), the package called maxLik available in R developed by Henningsen and Toomet [11] was used. The numerical results are well-behaved since was not found numerical problems using the SANN method (Simulated-annealing), such as failure evidence of convergence or end on multiple maxima. The programs can be obtained, upon request.

The estimates obtained from Tables 1-4 for α, λ  and p are asymptotically unbiased, implying that MREs tend to one when n increases and the MSEs decrease to zero for n large. Analyzing the MLEs performance, with a coverage probabilities tending to 0.95, good coverage properties may be deliberated for the parameter estimators. In practical applications, those estimation procedures will be relevant as shown in the next section.

θ

α=0.5

λ=2.0

p=0.3

0.457

n

MRE

MSE

C95%

MRE

MSE

C95%

MRE

MSE

C95%

Mp

25

1.265

0.060

0.948

1.100

3.344

0.810

1.100

0.021

0.925

0.461

50

1.114

0.020

0.952

1.117

2.002

0.860

1.024

0.013

0.938

0.458

100

1.048

0.008

0.952

1.098

1.087

0.893

0.992

0.008

0.948

0.457

200

1.022

0.004

0.953

1.059

0.488

0.919

0.991

0.004

0.952

0.457

300

1.014

0.003

0.951

1.039

0.293

0.928

0.993

0.003

0.952

0.457

Table 1: MREs, MSEs, C95% estimates for 100,000 considering n = (25, 50, 100, 200, 300) and 45.7% of censorship.

 

α=0.5

λ=2.0

p=0.5

0.612

n

MRE

MSE

C95%

MRE

MSE

C95%

MRE

MSE

C95%

Mp

25

1.384

0.144

0.939

1.263

8.714

0.788

1.027

0.022

0.914

0.612

50

1.158

0.033

0.940

1.238

5.967

0.838

1.001

0.014

0.928

0.612

100

1.070

0.013

0.946

1.156

2.683

0.877

0.994

0.007

0.940

0.612

200

1.031

0.006

0.951

1.085

0.841

0.906

0.994

0.004

0.948

0.612

300

1.020

0.004

0.951

1.056

0.459

0.921

0.996

0.002

0.951

0.612

Table 2: MREs, MSEs, C95% estimates for 100,000 considering n = (25, 50, 100, 200, 300) and 61% of censorship.

 

α=2.0

λ=4.0

p=0.3

0.35

n

MRE

MSE

C95%

MRE

MSE

C95%

MRE

MSE

C95%

Mp

25

1.103

0.316

0.951

1.022

0.347

0.921

0.997

0.010

0.927

0.349

50

1.047

0.116

0.951

1.011

0.155

0.938

0.998

0.005

0.937

0.348

100

1.023

0.050

0.950

1.005

0.074

0.945

1.000

0.002

0.945

0.349

200

1.011

0.023

0.950

1.003

0.036

0.947

1.000

0.001

0.947

0.349

300

1.008

0.015

0.951

1.002

0.024

0.947

0.999

0.001

0.947

0.348

Table 3: MREs, MSEs, C95% estimates for 100,000 considering n = (25, 50, 100, 200, 300) and 35% of censorship.

 

α=2.0

λ=4.0

p=0.3

0.535

n

MRE

MSE

C95%

MRE

MSE

C95%

MRE

MSE

C95%

Mp

25

1.158

0.619

0.953

1.034

0.546

0.910

0.998

0.011

0.933

0.535

50

1.068

0.182

0.953

1.016

0.230

0.933

0.999

0.006

0.942

0.535

100

1.033

0.075

0.950

1.007

0.105

0.942

1.000

0.003

0.947

0.535

200

1.016

0.034

0.950

1.004

0.051

0.946

0.999

0.001

0.947

0.534

300

1.011

0.022

0.949

1.002

0.034

0.948

1.000

0.001

0.948

0.535

Table 4: MREs, MSEs, C95% estimates for 100,000 considering n = (25, 50, 100, 200, 300) and 53.5% of censorship.

Application

In this section, we considered the data set presented by Kersey et al. [12]. The results were collected in a group of 46 patients, per years, upon the recurrence of leukemia whom received autologous marrow. Table 5 shows the full data set (+ indicates censored observations).

0.0301

0.0384

0.0630

0.0849

0.0877

0.0959

0.1397

0.1616

0.1699

0.2137

0.2137

0.2164

0.2384

0.2712

0.2740

0.3863

0.4384

0.4548

0.5918

0.6000

0.6438

0.6849

0.7397

0.8575

0.9096

0.9644

1.0082

1.2822

1.3452

1.4000

1.5260

1.7205+

1.9890+

2.2438

2.5068+

2.6466+

3.0384

3.1726+

3.4411

4.4219+

4.4356+

4.5863+

4.6904+

4.7808+

4.9863+

5.0000+

 

 

Table 5: Leukemia free-survival times (in years) for the 46 autologous transplant patients (where + indicates censored observations).

The proposed model is compared with some usual long-term survival models, such as the LT Weibull and LT weighted Lindley (Louzada and Ramos, [13]). Different discrimination criterion methods are considered:  the negative of the maximum value of the likelihood functionl(ˆθ;t) , the Akaike information criterion (AIC=2l(ˆθ;t)+2k) and the corrected AIC(AIC+2k(k+1)/ (nk1)) , where k is the number of parameters to be fitted. The best model is the one which provides the minimum criterion method values.

Figure 2 presents the empirical survival function adjusted by the Kaplan-Meier estimator and different LT survival distributions.

Figure 2: Survival function adjusted by the empirical survival function (Kaplan-Meier estimator), LT Fréchet, LT Weibull and LT WL distribution.

Table 6 presents the results of the different discrimination criteria for different probability distributions. Comparing the results of the different discrimination methods, we observed that the LT Fréchet distribution has better fit then the LT models under the Weibull and weighted Lindley baseline distribution.

Method

LT Fréchet

LT Weibull

LT WL

logL

45.33

46.15

46.56

AIC

96.66

98.30

99.12

AICc

97.23

98.87

99.69

Table 6: Represents the results of the different discrimination criteria for different probability distributions.

The MLEs were obtained through the same procedure as described in Section 3. The standard error (SE) and the confidence intervals, considering a 95% confidence level for α, λ and p  are displays in Table 7.

θ

MLE

SE

CI95%(θ)

α

0.65682

0.01975

(0.38140;0.93225)

λ

0.31358

0.01531

( 0.07106;0.55609)

p

0.12476

0.01597

(0.00000;0.37245)

Table 7: MLE, Standard Error (SE), and confidence interval under 95% confidence level for α, λ and p .

Note that, in Kersey et al. [12] they use the non-parametric KM estimate of the cure fraction in which was 0.20 where (0.08;0.32) is the 95% confidence interval. Therefore, results showed to be consistence with Kersey et al. [12] results while our estimate was contained in the non-parametric interval. By using our parametric model the estimate obtained for p was 0.125 showing an overestimation of the long term survival patients. As it can be seen, through our proposed methodology the data related to the leukemia free-survival times (in months) for the 50 autologous transplant patients can be described by the LF distribution.

Discussion

In this paper, we have proposed a new long-term survival distribution called long term Fréchet distribution and its mathematical properties were studied. It was presented results towards the maximum likelihood parameters’ estimators and their asymptotic properties. The estimators’ efficient were present in the simulation study as the MLEs for the three unknown parameters obtained acceptable results even for small sample sizes. As such of the real dataset problem, related to the leukemia, free-survival times (in months) for the 50 autologous transplant patients. Many extensions from this present work can be considered, for instance, the parameters estimation may also be studied under an objective Bayesian analysis (Ramos et al., [14,15]) or using different classical methods (Louzada et al., Bakouch et al. [16]). Other approach should be to include covariates under the assumption of Cox model, i.e., proportional hazards. In conclusion, this regression model can be extended for the Bayesian approach as well.

Acknowledgements

The authors are thankful to the Editorial Board and to the reviewers for their valuable comments and suggestions which led to this improved version.

References

  1. Jenkinson AF (1955) The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Quarterly Journal of the Royal Meteorological Society 81(348): 158-171.
  2. http://www.londrina.pr.gov.br/index.php?option=com_content&view=article&id=15069:projetos-1o-bimestre-e-m-bartolomeu-de-gusmao-2012&catid=144:canal-educativo&Itemid=1348
  3. Krishna E, Jose KK, Alice T, Ristić MM (2013) The Marshall Olkin Fréchet distribution. Communications in Statistics Theory and Methods 42(22): 4091-4107.
  4. Barreto Souza W, Cordeiro GM, Simas AB (2011) Some results for beta Fréchet distribution. Communications in Statistics Theory and Methods 40(5): 798-811.
  5. Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. Journal of the American Statistical Association 47(259): 501-515.
  6. Roman M, Louzada F, Cancho VG, Leite JG (2012) A new long term survival distribution for cancer data. Journal of Data Science 10(2): 241-258.
  7. Louzada F, Ramos PL (2017) A New Long Term Survival Distribution. Biostat Biometrics Open Acc J 1(4): 1-6.
  8. Maller RA, Zhou S (1995) Testing for the presence of immune or cured individuals in censored survival data. Biometrics 51(4): 1197-1205.
  9. Perdoná GC, Louzada Neto F (2011) A general hazard model for lifetime data in the presence of cure rate. Journal of Applied Statistics 38(7): 1395-1405.
  10. Migon HS, Gamerman D, Louzada F (2014) Statistical inference: an integrated approach. (2nd edn), CRC press, Brazil.
  11. Henningsen A, Toomet O (2011) maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3): 443-458.
  12. Kersey JH, Weisdorf D, Nesbit ME, LeBien TW, Woods WG, et al. (1987) Comparison of autologous and allogeneic bone marrow transplantation for treatment of high-risk refractory acute lymphoblastic leukemia. N Engl J Med 317(8): 461-467.
  13. Louzada F, Ramos PL, Perdoná GC (2016) Different estimation procedures for the parameters of the extended exponential geometric distribution for medical data. Computational and mathematical methods in medicine 2016(2016): 1-12.
  14. Ramos PL, Moala FA, Achcar JA (2014) Objective priors for estimation of extended exponential geometric distribution. Journal of Modern Applied Statistical Methods 13(1): 226-243.
  15. Ramos PL, Achcar JA, Moala FA, Ramos E, Louzada F (2017) Bayesian analysis of the generalized gamma distribution using non informative priors. A Journal of Theoretical and Applied Statistics 51(4): 824-843.
  16. Bakouch HS, Dey S, Ramos PL, Louzada F (2017) Binomial-exponential 2 Distribution: Different Estimation Methods and Weather Applications. Trends in Applied and Computational Mathematics 18(2): 233.
Creative Commons Attribution License

©2017 Ramos, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.