Research Article Volume 13 Issue 1
1 Department of Statistics, Assam University, Silchar, Assam, India
2 Department of Statistics, Jaypee Institute of Information Technology, Noida, India
Correspondence: Rama Shanker, Department of Statistics, Assam University, Silchar, India
Received: January 25, 2024 | Published: March 5, 2024
Citation: Shanker R, Das R, Shukla KK. An extended Suja distribution with statistical properties and applications. Biom Biostat Int J. 2024;13(1):16-21. DOI: 10.15406/bbij.2024.13.00409
An extended Suja distribution, of which one parameter Suja distribution is a particular case, has been proposed. Important statistical properties of the proposed distribution based on moments, skewness, kurtosis, index of dispersion, hazard rate function, mean residual life function, stochastic ordering, mean deviations, Renyi entropy measures, and stress-strength reliability have been derived and studied. The method of moments and the method of maximum likelihood for estimating parameters have been discussed. A simulation study has been presented to know the performance of maximum likelihood estimates. Applications and goodness of fit of the proposed distribution with two real datasets have been presented.
Keywords: Suja distribution, statistical properties, parameters estimation, Goodness of fits
The search for an appropriate statistical distribution for modeling of lifetime data is very challenging because the lifetime data are stochastic in nature. Statistical distributions are needed for modeling of lifetime data in engineering, medical science, demography, social sciences, physical sciences, finance, insurance, demography, social sciences, literature etc and during recent decades several researchers in statistics and mathematics tried to introduce lifetime distributions. In the exploration for a new lifetime distribution which can be useful to model lifetime data, Shanker1 proposed a one parameter distribution named Suja distribution defined by its probability density function (pdf) and cumulative distribution function (cdf)
f(x;θ)=θ5θ4+24(1+x4)e− θ x;x>0,θ>0f(x;θ)=θ5θ4+24(1+x4)e−θx;x>0,θ>0 (1.1)
F(x;θ)=1−[1+θ4x4+4θ3x3+12θ2x2+24θxθ4+24]e−θx;x>0,θ>0F(x;θ)=1−[1+θ4x4+4θ3x3+12θ2x2+24θxθ4+24]e−θx;x>0,θ>0 (1.2)
Length- biased Suja distribution, power length-biased Suja distribution and weighted Suja distribution have been proposed and studied by Al-Omari and Alsmairan,2 Al-Omari et al.3 and Alsmairan et al.4 respectively. Todoka et al.5 have studied on the cdf of various modifications of Suja distributions and discussed their applications in the field of the analysis of computer- viruses’ propagation and debugging theory.
The main purpose of proposing an extended Suja distribution is to see the impact of additional parameter in the distribution over one parameter and other two-parameter distributions. Various descriptive measures, reliability properties and estimation parameters using both the method of moments and the method of maximum likelihood have been discussed. The applications and the goodness of fit of the distribution with two real lifetime datasets have been presented.
Taking the convex combination of exponential (θ) distribution and gamma (5,θ) distribution with mixing proportion p=αθ4αθ4+24p=αθ4αθ4+24 , the pdf of extended Suja distribution can be expressed as
f(x;θ,α)=θ5αθ4+24(α+x4)e−θx;x>0,θ>0,α>0f(x;θ,α)=θ5αθ4+24(α+x4)e−θx;x>0,θ>0,α>0 (2.1)
We would call this a two-parameter Suja distribution (TPSD). The corresponding cdf and survival function of TPSD are thus obtained as
F(x;θ,α)=1−[1+θ4x4+4θ3x3+12θ2x2+24θxαθ4+24]e−θx;x>0,θ>0,α>0F(x;θ,α)=1−[1+θ4x4+4θ3x3+12θ2x2+24θxαθ4+24]e−θx;x>0,θ>0,α>0 (2.2)
S(x;θ,α)=[θ4x4+4θ3x3+12θ2x2+24θx+(αθ4+24)αθ4+24]e−θx;x>0,θ>0,α>0S(x;θ,α)=[θ4x4+4θ3x3+12θ2x2+24θx+(αθ4+24)αθ4+24]e−θx;x>0,θ>0,α>0 .
At α=1, TPSD reduces to Suja distribution. Also, for α=∞, TPSD reduces to exponential distribution. The pdf and the cdf of TPSD for varying values of parameters are shown in the Figures 1 & 2 respectively.
The rth moment about origin (raw moment) μr′ of TPSD can be obtained as
μr′=r!{αθ4+(r+1)(r+2)(r+3)(r+4)}θr(α θ4+24);r=1,2,3,...
Thus first four raw moments of TPSD can be expressed as
μ1′=αθ4+120θ(αθ4+24) ,μ2′=2(αθ4+360)θ2(αθ4+24) ,μ3′=6(αθ4+840)θ3(αθ4+24) and μ4′=24(αθ4+1680)θ4(αθ4+24) .
The central moments of TPSD are thus obtained as
μ2=α2θ8+528αθ4+2880θ2(αθ4+24)2
μ3=2(α3θ12+1512α2θ8+1728αθ4+69120)θ3(αθ4+24)3
μ4=9(α4θ16+2656α3θ12+58752α2θ8+1234944αθ4+3870720)θ4(αθ4+24)4
The descriptive measures based on moments of TPSD such as coefficient of variation (C.V), coefficient of skewness (√β1) , coefficient of kurtosis (β2) and index of dispersion (γ) are obtained as
C.V.=√μ2μ1′=√α2θ8+528αθ4+2880αθ4+120
√β1=μ3(μ2)3/2=2(α3θ12+1512α2θ8+1728αθ4+69120)(α2θ8+528αθ4+2880)3/2
β2=μ4μ22=9(α4θ16+2656α3θ12+58752α2θ8+1234944αθ4+3870720)(α2θ8+528αθ4+2880)2
γ=μ2μ1′=α2θ8+528αθ4+2880θ(αθ4+24)(αθ4+120)
The behaviors of these descriptive measures are shown in the Figures 3-6 respectively.
The hazard rate function h(x) and the mean residual life function m(x) of a random variable X having pdf f(x) and cdf F(x) are defined as
h(x)=limΔx→0P(x<X<x+Δx|X>x)Δx=f(x)1−F(x) and
m(x)=E[X−x|X>x]=11−F(x)∞∫x[1−F(t)]dt=1S(x)∫∞xtf(t)dt−x .
Thus h(x) and m(x) of the TPSD are obtained as
h(x)=θ4(α+x4)θ4x4+4θ3x3+12θ2x2+24θx+(αθ4+24)
and m(x)=θ4x4+8θ3x3+36θ2x2+96θx+(αθ4+120)θ[θ4x4+4θ3x3+12θ2x2+24θx+(αθ4+24)].
The h(x) and m(x) of TPSD are shown in Figures 7 & 8 respectively.
The mean deviation about the mean and the mean deviation about the median are defined as
δ1(X)=∞∫0|x−μ|f(x)dx=2μF(μ)−2μ∫0xf(x)dx
and δ2(X)=∞∫0|x−M|f(x)dx=μ−2M∫0x f(x)dx , respectively,
where μ=E(X) and M=Median (X) .
We have
μ∫0xf(x;θ,α)dx=μ−{θ5μ5+5θ4μ4+20θ3μ3+60θ2μ2+(αθ4+120)θμ+(αθ4+120)}e−θμθ(αθ4+24)
M∫0x f(x;θ,α)dx=μ−{θ5M5+5θ4M4+20θ3M3+60θ2M2+(αθ4+120)θM+(αθ4+120)}e−θ Mθ(αθ4+24)
Using above expressions and after little simplifications, the mean deviation about mean,δ1(X) and the mean deviation about median, δ2(X) of TPSD are obtained as
δ1(X)=2{θ4μ4+8θ3μ3+36θ2μ2+96θμ+(αθ4+120)}e−θμθ(αθ4+24)
δ2(X)=2{θ5M5+5θ4M4+20θ3M3+60θ2M2+(αθ4+120)θM+(αθ4+120)}e−θMθ(αθ4+24)−μ .
Let X(1)<X(2)<...<X(n) denote the order statistics corresponding to random sample (X1,X2,...,Xn) . The pdf and the cdf of the kth order statistic, say Y=X(k) are given by
fY(y)=n!(k−1)!(n−k)!Fk−1(y){1−F(y)}n−kf(y)
=n!(k−1)!(n−k)!n−k∑l=0(n−kl)(−1)lFk+l−1(y)f(y)
and
FY(y)=n∑j=k(nj) Fj(y){1−F(y)}n−j=n∑j=kn−j∑l=0(nj)(n−jl) (−1)lFj+l(y)
The pdf and the cdf of the kth order statistics of TPSD are thus obtained as
and
FY(y)=n∑j=kn−j∑l=0(nj)(n−jl)(−1)l[1−{θ4x4+4θ3x3+12θ2x2+24θx+(α θ4+24)αθ4+24}e−θx]j+l
Stochastic orderings
Stochastic ordering of positive continuous random variables is an important tool for judging their comparative behavior. A random variable Y is said to be greater than a random variable X in the
The well-known results due to Shaked and Shanthikumar6 for establishing stochastic ordering of distributions is
X≤lrY⇒X≤hrY⇒X≤mrlY ⇓X≤stY
Using above results, we have shown in the following theorem that TPSD is ordered with respect to the strongest ‘likelihood ratio’ ordering.
Theorem: Let X∼ TPSD (θ1,α1) and Y∼ TPSD (θ2,α1) . If α1>α2 and θ1=θ2 , or α1=α2 and θ1=θ2 then X≤lrY and hence X≤hrY ,X≤mrlY and X≤stY .
Proof: We have
fX(x)fY(x)=θ15(α2θ24+24)θ25(α1θ14+24)(α1+x4α2+x4)e−(θ1−θ2)x;x>0
Now lnfX(x)fY(x)=lnθ15(α2θ24+24)θ25(α1θ14+24)+ln(α1+x4α2+x4)−(θ1−θ2)x
This gives ddxlnfX(x)fY(x)=4(α2−α1)x3(α1+x4)(α2+x4)−(θ1−θ2)
Thus, for α1>α2 and θ1=θ2 , or α1=α2 and θ1>θ2 , ddxlnfX(x)fY(x)<0 . This means that X≤lrY and hence X≤hrY ,X≤mrlY and X≤stY .
A measure of variation of uncertainty of a random variable X is known as Renyi entropy measure and given by Renyi.7 If X is a continuous random variable having pdf f(.), then Renyi entropy is defined as
TR(γ)=11−γlog{∫fγ(x)dx} ,where γ>0 and γ≠1 .
Thus, the Renyi entropy of TPSD can be obtained as
TR(γ)=11−γlog[∞∫0θ5γ(αθ4+24)γ(α+x4)γe−θγxdx]
=11−γlog[∞∑j=0(γj)θ5γ−4j−1αγ−j(α θ4+24)γΓ(4j+1)(γ)4j+1] .
Let X and Y denote the strength and the stress of a component. The stress- strength reliability describes the life of a component whose random strength is subjected to a random stress. When X<Y , the component fails instantly and the component will function satisfactorily till X>Y . Therefore, R=P(Y<X) is the measure of component reliability and is known as stress-strength parameter. It has wide applications in engineering, biomedical science, social science etc.
Let X and Y are independent strength and stress random variables having TPSD with parameter (θ1,α1) and (θ2,α2) , respectively. Then, the stress-strength reliability R of TPSD can be obtained as
R=P(Y<X)=∞∫0P(Y<X|X=x)fX(x)dx=∞∫0f(x;θ1,α1) F(x;θ2,α2)dx
=1−θ15[40320θ24+20160θ23(θ1+θ2)+8640θ22(θ1+θ2)2+2880θ2(θ1+θ2)3+24(2α1θ24+24)(θ1+θ2)4+24α1θ23(θ1+θ2)5+24α1θ22(θ1+θ2)6+24α1θ2(θ1+θ2)7+α1(α2θ24+24)(θ1+θ2)8](α1θ14+24)(α2θ24+24)(θ1+θ2)9 .
Method of moments
Since TPSD has two parameters to be estimated, the first two moments about origin are required to estimate its parameters. We have
μ2′(μ1′)2=2(αθ4+360)(αθ4+24)(αθ4+120)2=k (Say)
Taking b=αθ4 , above equation becomes
2(b+360)(b+24)(b+120)2=k
2(b2+384b+8670)b2+240b+14400=k
(k−2)b2+(240k−768)b+(14400k−17340)=0 (11.1.1)
Now, for real root of, the discriminant of the above equation should be greater than and equal to zero. That is
(240k−768)2−4(k−2)(14400k−17340)≥0⇒k≤2.45 .
This means that the method of moments estimate is applicable if k=m2′(ˉx)2≤2.45 , where m2′ is the second moment about origin and ˉx is the sample mean of the dataset. Now taking b=αθ4 in the expression for mean, we get the moment estimate ˜θ of θ as
αθ4+120θ(αθ4+24)=b+120θ(b+24)=ˉx⇒˜θ=b+120(b+24)ˉx .
Using the moment estimate of θ in b=αθ4 , we get the moment estimate ˜α of α as
˜α=b(˜θ)4=b(b+124)4(ˉx)4(b+120)4
Thus the method of moment estimates (˜θ,˜α) of parameters (θ,α) of TPSD are given by
(˜θ,˜α)=(b+120(b+24) ˉx,b(b+124)4(ˉx)4(b+120)4) ,
where b is the value of the quadratic equation in (11.1.1).
Method of maximum likelihood
Let (x1,x2,x3,...,xn) be a random sample of size n from TPSD (θ,α). Then the log- likelihood function of TPSD is given by
logL=n[5logθ−log(αθ4+24)]+n∑i=1log(α+xi4)−nθˉx .
The maximum likelihood estimates (ˆθ,ˆα) of parameters (θ,α) are the solution of the following log-likelihood equations
∂logL∂θ=5nθ+4nθααθ4+24−nˉx=0
∂logL∂α=−nθ4αθ4+24+n∑i=11α+xi4=0
We have to use Fisher’s scoring method for solving these two log-likelihood equations because these two log-likelihood equations cannot be solved directly. We have
∂2logL∂θ2=−5nθ2+4nα2θ6−288nαθ2(αθ4+24)2
∂2logL∂α2=nθ8(αθ4+24)2−n∑i=11(α+xi4)2
∂2logL∂θ∂α=−96nθ3(αθ4+24)2=∂2logL∂α∂θ .
The following equations can be solved for MLEs (ˆθ,ˆα) of (θ,α) of TPSD
[∂2lnL∂θ2∂2lnL∂θ∂α∂2lnL∂θ∂α∂2lnL∂α2]ˆθ=θ0ˆα=α0[ˆθ−θ0ˆα−α0]=[∂lnL∂θ∂lnL∂α]ˆθ=θ0ˆα=α0
where θ0 and α0 are the initial values of θ and α, as given by the method of moments. These equations are solved iteratively till close estimates of parameters are obtained.
A simulation study has been carried out to check the performance of maximum likelihood estimates by taking sample sizes (n = 20,40,60,80) for values of θ=0.5,1.0,1.5,2.0 and α=0.5 and 4. Acceptance and rejection method is used to generate random number for data simulation using R-software. The process was repeated 1,000 times for the calculation of Average Bias error (ABE) and MSE (Mean square error) of parameters θ and α are presented in Tables 1 &2 respectively. For the TPSD decreasing trend has been observed in ABE and MSE as the sample size increases and this shows that the performance of maximum likelihood estimators is quite good and consistent.
Sample |
θ |
ABE(θ) |
MSE (θ) |
ABE (α) |
MSE (α) |
20 |
0.5 |
0.0323 |
0.02083 |
0.0645 |
0.7180 |
1.0 |
0.0073 |
0.0010 |
0.1145 |
0.2621 |
|
1.5 |
-0.0177 |
0.0063 |
0.0645 |
0.0831 |
|
2.0 |
-0.0427 |
0.0365 |
0.0144 |
0.0041 |
|
40 |
0.5 |
0.0168 |
0.0113 |
-0.0074 |
0.1210 |
1.0 |
0.0043 |
0.0007 |
0.0175 |
0.0122 |
|
1.5 |
-0.0081 |
0.0026 |
-0.0074 |
0.0022 |
|
2.0 |
-0.0206 |
0.0170 |
-0.0324 |
0.0422 |
|
60 |
0.5 |
0.0098 |
0.0058 |
-0.0011 |
0.0982 |
1.0 |
0.0015 |
0.0001 |
0.0143 |
0.0154 |
|
1.5 |
-0.0067 |
0.0027 |
-0.0011 |
0.0008 |
|
2.0 |
-0.0151 |
0.0136 |
-0.0178 |
0.0191 |
|
80 |
0.5 |
0.0057 |
0.0026 |
0.0292 |
0.2932 |
1.0 |
-0.0004 |
0.0001 |
0.0417 |
0.1397 |
|
1.5 |
-0.0067 |
0.0035 |
0.0292 |
0.0686 |
|
2.0 |
-0.0129 |
0.01342 |
0.0167 |
0.0225 |
Table 1 ABE and MSE of parameters at fixed value α=0.5
Sample |
θ |
ABE (θ) |
MSE (θ) |
ABE (α) |
MSE (α) |
20
|
0.5 |
0.0156 |
0.0048 |
0.0387 |
0.5365 |
1.0 |
-0.0093 |
0.0017 |
0.0887 |
0.1576 |
|
1.5 |
-0.0343 |
0.0236 |
0.0387 |
0.0300 |
|
2.0 |
-0.0593 |
0.0704 |
-0.0112 |
0.0025 |
|
40
|
0.5 |
0.0168 |
0.0113 |
-0.0074 |
0.1210 |
1.0 |
0.0043 |
0.0007 |
0.01750 |
0.0122 |
|
1.5 |
-0.0081 |
0.0026 |
-0.0074 |
0.0022 |
|
2.0 |
-0.0206 |
0.0170 |
-0.0324 |
0.0422 |
|
60
|
0.5 |
0.0100 |
0.0060 |
-0.0064 |
0.0745 |
1.0 |
0.0016 |
0.0001 |
0.0102 |
0.0063 |
|
1.5 |
-0.0066 |
0.0026 |
-0.0064 |
0.0024 |
|
2.0 |
-0.0149 |
0.0134 |
-0.0230 |
0.0319 |
|
80
|
0.5 |
0.0057 |
0.0026 |
0.0306 |
0.3063 |
1.0 |
-0.0004 |
0.0017 |
0.0431 |
0.1488 |
|
1.5 |
-0.0068 |
0.0036 |
0.0306 |
0.0750 |
|
2.0 |
-0.0129 |
0.0134 |
0.0181 |
0.0262 |
Table 2 ABE and MSE of parameters at fixed value of α=4
The goodness of fit of TPSD along with its comparison with one parameter Suja distribution and two-parameter lifetime distributions including quasi Lindley distribution (QLD) of Shanker and Mishra,8 a two-parameter Lindley distribution (TPLD-I b) of Shanker and Mishra,9 a two-parameter Lindley distribution (TPLD-II) of Shanker et al.10 for two real lifetime datasets relating to failure times have been discussed. The applications of the TPSD can also be extended to model the survival times of patients suffering from serious disease in medical sciences. The pdf and the cdf of these distributions are presented in the following Table 3.
Distributions |
|
Cdf |
TPLD-I |
f(x;θ,α)=θ2θα+1(α+x)e−θx;x>0,θ>0,θα>−1 | F(x;θ,α)=1−(1+θxαθ+1)e−θx |
TPLD-II |
f(x;θ,α)=θ2θ+α(1+αx)e−θx;x>0,θ>0,α>0 | F(x;θ,α)=1−(1+αθxθ+α)e−θx |
QLD |
f(x;θ,α)=θα+1(α+θx)e−θx | F(x;θ,α)=1−(1+θxα+1)e−θx |
Table 3 pdf and the cdf of two-parameter distributions
The two datasets considered for testing the goodness of fit of TPSD over other one parameter and two-parameter lifetime distributions are as follows:
Dataset 1: The positively skewed data relating to the accelerated life testing of item ( n=55 ) with changes in stress from 100 to 150 at time t=15 , available in Murthy et al (2004).
0.032, 0.035, 0.104, 0.169, 0.196, 0.260, 0.326, 0.445, 0.449, 0.496, 0.543, 0.544, 0.577, 0.648, 0.666, 0.742, 0.757, 0.808, 0.857, 0.858, 0.882, 1.005, 1.025, 1.472, 1.916, 2.313, 2.457, 2.530, 2.543, 2.617, 2.835, 2.940, 3.002, 3.158, 3.430, 3.459, 3.502, 3.691, 3.861, 3.952, 4.396, 4.744, 5.346, 5.479, 5.716, 5.825, 5.847, 6.084, 6.127, 7.241, 7.560, 8.901, 9.000, 10.482, 11.133.
Dataset 2: The positively skewed failure time data (n=40 ), available in Murthy et al (2004).
0.13, 0.62, 0.75, 0.87, 1.56, 2.28, 3.15, 3.25, 3.55, 4.49, 4.50, 4.61, 4.79, 7.17, 7.31, 7.43, 7.84, 8.49, 8.94, 9.40, 9.61, 9.84, 10.58, 11.18, 11.84, 13.28, 14.47, 14.79, 15.54, 16.90, 17.25, 17.37, 18.69, 18.78, 19.88, 20.06, 20.10, 20.95, 21.72, 23.87.
The corresponding maximum likelihood estimates of parameters along with -2logL, AIC, kolmogorov-Smirnov (K-S) and p-values of the considered datasets for the given distributions are presented in Table 4 & 5, respectively. The fitted plots of the distributions for the considered two datasets have been shown in Figures 9 & 10 respectively. The goodness of fit in Tables 4 & 5 and the fitted plots in Figures 9 & 10 shows that TPSD gives much closer fit for the considered datasets in Table 4 while in Table 5 TPLD-1 gives better fit over other distributions. Therefore, it can be concluded that TPSD and TPLD-1 can be considered the best distributions for lifetime data.
Distributions |
ML estimates |
-2logL |
AIC |
K-S |
p-value |
|
|
θ |
α |
|
|
|
|
TPSD |
0.9563 |
32.1684 |
226.65 |
230.65 |
0.086 |
0.774 |
QLD |
0.3848 |
5.19455 |
231.44 |
235.44 |
0.135 |
0.244 |
TPLD-I |
0.3907 |
11.6595 |
231.45 |
235.45 |
0.136 |
0.235 |
TPLD-II |
0.383 |
0.07082 |
231.44 |
235.44 |
0.134 |
0.246 |
SD |
1.4504 |
…….. |
265.86 |
267.86 |
0.282 |
0.0002 |
Table 4 ML estimates of the parameters of distributions and values of −2logL,AIC,K−S,p−value for data set 1.
Distributions |
ML estimates |
-2logL |
AIC |
K-S |
p-value |
|
|
θ |
α |
|
|
|
|
TPSD |
0.4175 |
158.423 |
262.10 |
266.10 |
0.136 |
0.406 |
QLD |
0.16453 |
0.3914 |
263.24 |
267.24 |
0.107 |
0.708 |
TPLD-I |
0.16456 |
2.3745 |
263.25 |
263.25 |
0.106 |
0.711 |
TPLD-II |
0.16453 |
0.42038 |
263.24 |
267.24 |
0.107 |
0.709 |
SD |
0.4778 |
…….. |
301.17 |
303.17 |
0.24 |
0.015 |
Table 5 ML estimates of the parameters of distributions values of −2logL,AIC,K−S,p−value for data
In this paper, a two-parameter Suja distribution has been proposed by introducing an additional parameter in one parameter Suja distribution to see its effect regarding goodness of fit over Suja distribution and other two-parameter lifetime distributions. Its various descriptive measures based on moments and reliability properties have been discussed. The estimation of parameters using method of moments and maximum likelihood method has been discussed. A simulation study has been presented to know the performance of maximum likelihood estimates. The goodness of fit of the proposed distribution has been presented with two real lifetime datasets.
Authors are grateful to the editor in chief and the anonymous reviewer for some minor comments which improved both the quality and the presentation.
None.
None.
©2024 Shanker, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7