Research Article Volume 4 Issue 7
Department of Statistics, Eritrea Institute of Technology, Eritrea
Correspondence: Rama Shanker, Department of Statistics, Eritrea Institute of Technology, Asmara, Eritrea
Received: September 30, 2016 | Published: December 9, 2016
Citation: Shanker R. Garima distribution and its application to model behavioral science data. Biom Biostat Int J. 2016;4(7):275-281. DOI: 10.15406/bbij.2016.04.00116
In this paper a continuous distribution named “Garima distribution” has been suggested for modeling data from behavioral science. The important properties including its shape, moments, skewness, kurtosis, hazard rate function, mean residual life function, stochastic ordering, mean deviations, order statistics, Bonferroni and Lorenz curves, entropy measure, stress-strength reliability have been discussed. The condition under which Garima distribution is over-dispersed, equi-dispersed, and under-dispersed are presented along with other one parameter continuous distributions. The estimation of its parameter has been discussed using maximum likelihood estimation and method of moments. The application of the proposed distribution has been explained using a numerical example from behavioral science and the fit has been compared with other one parameter continuous distributions.
Keywords: lifetime distribution, moments, hazard rate function, mean residual life function, mean deviations, order statistics, estimation of parameter, goodness of fit
The modeling and analyzing lifetime data are crucial in many applied sciences including behavioral science, medicine, engineering, insurance and finance, amongst others. There are a number of continuous distributions for modeling lifetime data such as exponential, Lindley, gamma, lognormal, and Weibull and their generalizations. The exponential, Lindley and the Weibull distributions are more popular than the gamma and the lognormal distributions because the survival functions of the gamma and the lognormal distributions cannot be expressed in closed forms and both require numerical integration. Though each of exponential and Lindley distributions has one parameter, the Lindley distribution has one advantage over the exponential distribution that the exponential distribution has constant hazard rate whereas the Lindley distribution has monotonically decreasing hazard rate.
Recently Shanker1–4 has introduced new lifetime distributions, namely Shanker, Akash, Aradhana, and Sujatha distributions for modeling lifetime data from biomedical sciences, engineering and behavioral sciences and showed its superiority over Lindley5 and exponential distributions. The probability density function (p.d.f.) and the cumulative distribution function (c.d.f.) of Sujatha, Aradhana, Akash, Shanker, Lindley and exponential distributions are presented in Table 1.
Distributions |
Cdf |
|
Sujatha |
f6(x;θ)=θ3θ2+θ+2(1+x+x2)e−θxf6(x;θ)=θ3θ2+θ+2(1+x+x2)e−θx |
F6(x,θ)=1−[1+θx(θx+θ+2)θ2+θ+2]e−θxF6(x,θ)=1−[1+θx(θx+θ+2)θ2+θ+2]e−θx |
Aradhana |
f5(x;θ)=θ3θ2+2θ+2(1+x)2e−θxf5(x;θ)=θ3θ2+2θ+2(1+x)2e−θx |
F5(x;θ)=1−[1+θx(θx+2θ+2)θ2+2θ+2]e−θxF5(x;θ)=1−[1+θx(θx+2θ+2)θ2+2θ+2]e−θx |
Akash |
f4(x;θ)=θ3θ2+2(1+x2)e−θxf4(x;θ)=θ3θ2+2(1+x2)e−θx |
F4(x;θ)=1−[1+θx(θx+2)θ2+2]e−θxF4(x;θ)=1−[1+θx(θx+2)θ2+2]e−θx |
Shanker |
f3(x;θ)=θ2θ2+1(θ+x)e−θxf3(x;θ)=θ2θ2+1(θ+x)e−θx |
F3(x,θ)=1−(θ2+1)+θxθ2+1e−θxF3(x,θ)=1−(θ2+1)+θxθ2+1e−θx |
Lindley |
f2(x;θ)=θ2θ+1(1+x)e−θx |
F2(x;θ)=1−[1+θxθ+1]e−θx |
Exponential |
f1(x;θ)=θe−θ x |
F1(x;θ)=1−e−θ x |
The probability density function (p.d.f.) of a new lifetime distribution can be introduced as
f7(x;θ)=θθ+2(1+θ+θ x)e−θx ;x>0, θ>0 (2.1)
We would call this distribution, “Garima distribution”. This distribution can be easily expressed as a mixture of exponential (θ) and gamma (2,θ) with mixing proportion θ+1θ+2 . We have
f7(x,θ)=p g1(x)+(1−p)g2(x) (2.2)
where p=θ+1θ+2, g1(x)=θe−θx, and g2(x)=θ2 x e−θx .
The corresponding cumulative distribution function (c.d.f.) of (2.1) is given by
F7(x;θ)=1−[1+θxθ+2]e−θx ; x>0,θ>0 (2.3)
The graphs of the p.d.f. and the c.d.f. of Garima distributions for different values of θ are shown in Figure 1.
The r the moment about origin of Garima distributon (2.1) has been obtained as
μr′=r!(θ+r+2)θr(θ+2) ;r=1,2,3,...
and so the first four moments about origin as
μ1′=θ+3θ(θ+2) , μ2′=2(θ+4)θ2(θ+2) , μ3′=6(θ+5)θ3(θ+2) , μ4′=24(θ+6)θ4(θ+2)
Using the relationship between central moments and the moments about origin, the central moments of Garima distribution are obtained as
μ2=θ2+6θ+7θ2(θ+2)2
μ3=2(θ3+9θ2+21θ+15)θ3(θ+2)3
μ4=3(3θ4+36θ3+134θ2+204θ+111)θ4(θ+2)4
Thus the coefficient of variation (C.V) , coefficient of skewness (√β1) , coefficient of kurtosis (β2) and index of dispersion (γ) of Garima distribution are obtained as
C.V=σμ1′=√θ2+6θ+7θ+3
√β1=μ3μ23/2=2(θ3+9θ2+21θ+15)(θ2+6θ+7)3/2
β2=μ4μ22=3(3θ4+36θ3+134θ2+204θ+111)(θ2+6θ+7)2
γ=σ2μ1′=θ2+6θ+7θ(θ+2)(θ+3)
The condition under which Garima distribution is over-dispersed (μ<σ2) , equi-dispersed (μ=σ2) and under-dispersed (μ>σ2) are presented in Table 2 along with other lifetime distributions.
Lifetime |
Over-Dispersion (μ<σ2) |
Equi-Dispersion |
Under-Dispersion |
Garima |
θ<1.164247938 |
θ=1.164247938 |
θ>1.164247938 |
Sujatha |
θ<1.364271174 |
θ=1.364271174 |
θ>1.364271174 |
Aradhana |
θ<1.283826505 |
θ=1.283826505 |
θ>1.283826505 |
Akash |
θ<1.515400063 |
θ=1.515400063 |
θ>1.515400063 |
Shanker |
θ<1.171535555 |
θ=1.171535555 |
θ>1.171535555 |
Lindley |
θ<1.170086487 |
θ=1.170086487 |
θ>1.170086487 |
Exponential |
θ<1 |
θ=1 |
θ>1 |
The moment generating function (MX(t)) , characteristic function (φX(t)) , and cumulant generating function (KX(t)) of Garima distribution (1.3) are given by
MX(t)=(1−(θ+1)tθ2+2θ)(1−tθ)−2 ,|tθ|≤1
φX(t)=(1−(θ+1)itθ2+2θ)(1−itθ)−2 ,i=√−1
KX(t)=log(1−(θ+1)itθ2+2θ)−2log(1−itθ)
Using the expansion log(1−x)=−∞∑r=0xrr , we get
KX(t)=−∞∑r=0(θ+1θ2+2θ)r(i t)rr+2∞∑r=0(itθ)rr
=2∞∑r=01θr(it)rr−∞∑r=0(θ+1θ2+2θ)r(it)rr
=2∞∑r=0(r−1)!θr(it)rr!−∞∑r=0(θ+1θ2+2θ)r(r−1)!(it)rr!
Thus the r th cumulant of Garima distribution is given by
Kr = coefficient of (it)rr! in KX(t)
=2(r−1)!θr−(r−1)!(θ+1)r(θ2+2θ)r ;r=1,2,3,...
This gives
μ1′=K1=θ+3θ(θ+2)
μ2=K2=θ2+6θ+7θ2(θ+2)2
μ3=K3=2(θ3+9θ2+21θ+15)θ3(θ+2)3
μ4=K4+3K22=3(3θ4+36θ3+134θ2+204θ+111)θ4(θ+2)4
Which the same are as obtained earlier.
Let X be a continuous random variable with pdf f(x) and cdf F(x) . The hazard rate function (also known as the failure rate function) and the mean residual life function of X are respectively defined as
h(x)=limΔx→0P(X<x+Δx|X>x)Δx=f7(x;θ)1−F7(x;θ) (5.1)
and m(x)=E[X−x|X>x] = 11−F7(x;θ)∫∞x[1−F7(t;θ)] dt (5.2)
The hazard rate function, h(x) and the mean residual life function, m(x) of Garima distribution are given by
h(x)=θ(1+θ+x)θx+(θ+2) (5.3)
and m(x)=θx+θ+3θ(θx+θ+2) (5.4)
It can be easily verified that h(0)=θ(θ+1)θ+2=f(0) and m(0)=θ+3θ(θ+2)=μ1′ .It is also obvious from the graphs of h(x) and m(x) that h(x) is an increasing or decreasing function of x , and θ , where as m(x) is a decreasing function of x , and θ . The graph of the hazard rate function and mean residual life function of Garima distribution are shown in Figures 2 & 3.
Stochastic ordering of positive continuous random variables is an important tool for judging their comparative behavior. A random variable X is said to be smaller than a random variable Y in the
The following results due to Shaked & Shanthikumar [6] are well known for establishing stochastic ordering of distributions
X≤lrY⇒X≤hrY⇒X≤mrlY (6.1)
⇓X≤stY
The Garima distribution is ordered with respect to the strongest ‘likelihood ratio’ ordering as shown in the following theorem:
fX(x)fY(x)=θ1(θ2+2)θ2(θ1+2)(1+θ1+θ1 x1+θ2+θ2 x)e−(θ1−θ2)x ; x>0
Now
logfX(x)fY(x)=log[θ1(θ2+2)θ2(θ1+2)]+log(1+θ1+θ1 x1+θ2+θ2 x)−(θ1−θ2)x
This gives ddxlogfX(x)fY(x)=θ1−θ2(1+θ1+θ1 x)(1+θ2+θ2 x)−(θ1−θ2)
Thus for θ1≥θ2 , ddxlogfX(x)fY(x)<0 . This means that X≤lrY and hence X≤hrY , X≤mrlY and X≤stY .
The amount of scatter in a population is measured to some extent by the totality of deviations usually from mean and median. These are known as the mean deviation about the mean and the mean deviation about the median defined by
δ1(X)=∞∫0|x−μ| f(x)dx and δ2(X)=∞∫0|x−M| f(x)dx , respectively, where μ=E(X) and M=Median (X) . The measures δ1(X) and δ2(X) can be calculated using the relationships
δ1(X)=μ∫0(μ−x)f(x)dx+∞∫μ(x−μ)f(x)dx
=μF(μ)−μ∫0x f(x)dx−μ[1−F(μ)]+∞∫μx f(x)dx
=2μF(μ)−2μ+2∞∫μx f(x)dx
=2μF(μ)−2μ∫0x f(x)dx (7.1)
and
δ2(X)=M∫0(M−x)f(x)dx+∞∫M(x−M)f(x)dx
=M F(M)−M∫0x f(x)dx−M[1−F(M)]+∞∫Mx f(x)dx
=−μ+2∞∫Mx f(x)dx
=μ−2M∫0x f(x)dx (7.2)
Using p.d.f. (2.1) and expression for the mean of Garima distribution, we get
μ∫0x f7(x;θ)dx=μ−{θ2μ2+(θ2+3θ)μ+(θ+3)}e−θ μθ(θ+2) (7.3)
M∫0x f7(x;θ)dx=μ−{θ2M2+(θ2+3θ)M+(θ+3)}e−θ Mθ(θ+2) (7.4)
Using expressions from (7.1), (7.2), (7.3), and (7.4), the mean deviation about mean, δ1(X) and the mean deviation about median, δ2(X) of Garima distribution are obtained as
δ1(X)=(2θ μ+θ+3)e−θ μθ(θ+2) (7.5)
δ2(X)=2{θ2 M2+(θ2+3θ)M+(θ+3)}e−θ Mθ(θ+2)−μ (7.6)
Let X1, X2, ..., Xn be a random sample of size n from Garima distribution (2.1). Let X(1)<X(2)< ... <X(n) denote the corresponding order statistics. The p.d.f. and the c.d.f. of the k th order statistic, say Y=X(k) are given by
fY(y)=n!(k−1)! (n−k)! Fk−1(y){1−F(y)}n−kf(y)
=n!(k−1)! (n−k)! n−k∑l=0(n−kl)(−1)lFk+l−1(y)f(y)
and
FY(y)=n∑j=k(nj) Fj(y){1−F(y)}n−j
=n∑j=kn−j∑l=0(nj)(n−jl) (−1)lFj+l(y) ,
respectively, for k=1,2,3,...,n .
Thus, the p.d.f. and the c.d.f of k th order statistics of Garima distribution are given by
fY(y)=n!θ(1+θ+θ x)e−θx(θ+2)(k−1)! (n−k)! n−k∑l=0(n−kl)×[1−θx+(θ+2)θ+2e−θx]k+l−1
and
FY(y)=n∑j=kn−j∑l=0(nj)(n−jl) (−1)l[1−θx+(θ+2)θ+2e−θx]j+l
The Bonferroni and Lorenz curves7 and Bonferroni and Gini indices have applications not only in economics to study income and poverty, but also in other fields like reliability, demography, insurance and medicine. The Bonferroni and Lorenz curves are defined as
B(p)=1pμq∫0x f(x) dx=1pμ[∞∫0x f(x)dx−∞∫qx f(x) dx]=1pμ[μ−∞∫qx f(x) dx] (9.1)
and L(p)=1μq∫0x f(x) dx=1μ[∞∫0x f(x)dx−∞∫qx f(x) dx]=1μ[μ−∞∫qx f(x) dx] (9.2)
respectively or equivalently
B(p)=1pμp∫0F−1(x) dx (9.3)
and L(p)=1μp∫0F−1(x) dx (9.4)
respectively, where μ=E(X) and q=F−1(p) .
The Bonferroni and Gini indices are thus defined as
B=1−1∫0B(p) dp (9.5)
and G=1−21∫0L(p) dp (9.6)
respectively.
Using p.d.f. (2.1), we get
∞∫qx f7(x;θ) dx={θ2 q2+(θ2+3θ)q+(θ+3)}e−θqθ(θ+2) (9.7)
Now using equation (8.7) in (8.1) and (8.2), we get
B(p)=1p[1−{θ2 q2+(θ2+3θ)q+(θ+3)}e−θqθ+3] (9.8)
and L(p)=1−{θ2 q2+(θ2+3θ)q+(θ+3)}e−θqθ+3 (9.9)
Now using equations (9.8) and (9.9) in (9.5) and (9.6), the Bonferroni and Gini indices of Garima distribution (2.1) are obtained as
B=1−{θ2 q2+(θ2+3θ)q+(θ+3)}e−θqθ+3 (9.10)
G=−1+2{θ2 q2+(θ2+3θ)q+(θ+3)}e−θqθ+3 (9.11)
Entropy of a random variable X is a measure of variation of uncertainty. A popular entropy measure is Renyi entropy [8]. If X is a continuous random variable having probability density function f(.) , then Renyi entropy is defined as
TR(γ)=11−γlog{∫fγ(x)dx}
where γ>0 and γ≠1 .
Thus, the Renyi entropy for the Garima distribution (2.1) is obtained as
TR(γ)=11−γlog[∞∫0θγ(θ+2)γ(1+θ+θ x)γe−θ γ xdx]
=11−γlog[∞∫0θγ(1+θ)γ(θ+2)γ(1+θθ+1x)γe−θ γ xdx]
=11−γlog[∞∫0θγ(1+θ)γ(θ+2)γ∞∑j=0(γj) (θθ+1x)je−θ γ xdx]
=11−γlog[∞∑j=0(γj)θγ+j(1+θ)γ−j(θ+2)γ∞∫0e−θ γ xxj+1−1dx]
=11−γlog[∞∑j=0(γj)θγ+j(1+θ)γ−j(θ+2)γΓ(j+1)(θγ)j+1]
=11−γlog[∞∑j=0(γj)θγ−1(θ+2)γΓ(j+1)(γ)j+1]
The stress- strength reliability describes the life of a component which has random strength X that is subjected to a random stress Y . When the stress applied to it exceeds the strength, the component fails instantly and the component will function satisfactorily till X>Y . Therefore, R=P(Y<X) is a measure of component reliability and in statistical literature it is known as stress-strength parameter. It has wide applications in almost all areas of knowledge especially in engineering such as structures, deterioration of rocket motors, static fatigue of ceramic components, aging of concrete pressure vessels etc.
Let X and Y be independent strength and stress random variables having Garima distribution (2.1) with parameter θ1 and θ2 respectively. Then the stress-strength reliability R of Garima distribution can be obtained as
R=P(Y<X)=∞∫0P(Y<X|X=x)fX(x)dx
=∞∫0f7(x;θ1) F7(x;θ2)dx
=1−θ1[(θ1+θ2)2(θ1 θ2+2θ1+θ2+1)+2θ1 θ2(θ1+θ2)+2θ1 θ2](θ1+2)(θ2+2)(θ1+θ2)3 .
Maximum likelihood estimates (MLE)
Let (x1, x2, x3, ... ,xn) be a random sample from Garima distribution (2.1). The likelihood function, L of (2.1) is given by
L=(θθ+2)nn∏i=1(1+θ+θ xi) e−n θ ˉx
The natural log likelihood function is thus obtained as
lnL=nln(θθ+2)+n∑i=1ln(1+θ+θ xi)−n θ ˉx
Now dlnLdθ=2nθ2+2θ+n∑i=11+xi1+θ+θ xi−n ˉx=0
where ˉx is the sample mean.
The maximum likelihood estimate, ˆθ of θ is the solution of the equation dlogLdθ=0 and so it can be obtained by solving the following non-linear equation
n∑i=11+xi1+θ+θ xi+2nθ2+2θ−n ˉx=0 (12.1.1)
Method of moment estimates (MOME)
Equating the population mean of the Garima distribution to the corresponding sample mean, the method of moment estimate (MOME) ˜θ , of θ can be obtained as
˜θ=(1−2ˉx)+√4ˉx2+8ˉx+12ˉx ;ˉx>0 (12.2.1)
In this section the goodness of fit of the Garima distribution has been discussed with an example from behavioral science. The data is related with behavioral sciences, collected by Balakrishnan N et al. [9]. The scale “General Rating of Affective Symptoms for Preschoolers (GRASP)” measures behavioral and emotional problems of children, which can be classified with depressive condition or not according to this scale. A study conducted by the authors in a city located at the south part of Chile has allowed collecting real data corresponding to the scores of the GRASP scale of children with frequency in parenthesis, which are:
19(6) |
20(15) |
21(14) |
22(9) |
23(12) |
24(10) |
25(6) |
26(9) |
27(8) |
28(5) |
29(6) |
30(4) |
31(3) |
32(4) |
33 |
34 |
35(4) |
36(2) |
37(2) |
39 |
42 |
44 |
In order to compare distributions, −2lnL , AIC (Akaike Information Criterion), AICC (Akaike Information Criterion Corrected), BIC (Bayesian Information Criterion),K-S Statistics ( Kolmogorov-Smirnov Statistics) for above data set have been computed and presented in Table 3. The formulae for computing AIC, AICC, and BIC are as follows:
AIC=−2lnL+2k , AICC=AIC+2k(k+1)(n−k−1) , BIC=−2lnL+kln n
The best distribution is the distribution which corresponds to lower values of −2lnL , AIC, AICC, and BIC.
It can be easily seen from above table that the Garima distribution is better than Aradhana, Sujatha, Akash, Shanker, Lindley and exponential distributions for modeling behavioral science data and thus Garima distribution should be preferred over Aradhana, Sujatha, Akash, Shanker, Lindley and exponential distributions for modeling behavioral science data.
Model |
ML Estimate |
−2lnL |
AIC |
AICC |
BIC |
Garima |
0.05317 |
188.32 |
190.32 |
190.35 |
193.23 |
Aradhana |
0.11557 |
989.49 |
991.49 |
991.52 |
994.40 |
Sujatha |
0.11745 |
985.69 |
987.69 |
987.72 |
990.60 |
Akash |
0.11961 |
981.28 |
983.28 |
983.31 |
986.18 |
Shanker |
0.07974 |
1033.10 |
1035.10 |
1035.13 |
1037.99 |
Lindley |
0.07725 |
1041.64 |
1043.64 |
1043.68 |
1046.54 |
Exponential |
0.04006 |
1130.26 |
1132.26 |
1132.29 |
1135.16 |
Table 3 MLE’s,-2ln L, AIC, AICC, and BIC of Garima, Aradhana, Sujatha [4], Akash [2], Shanker [1], Lindley [5] and exponential distributions
A one parameter lifetime distribution named, “Garima distribution” has been proposed and studied. Its mathematical properties including shape, moments, skewness, kurtosis, hazard rate function, mean residual life function, stochastic ordering, mean deviations, order statistics, Bonferroni and Lorenz curves, Renyi entropy and stress-strength reliability have been discussed. The condition under which Garima distribution is over-dispersed, equi-dispersed, and under-dispersed are presented along with the conditions under which Sujatha, Aradhana, Akash, Shanker, Lindley and exponential distributions are over-dispersed, equi-dispersed and under-dispersed. The method of moments and the method of maximum likelihood estimation have also been discussed for estimating its parameter. Finally, a numerical example from behavioral science has been considered for the goodness of fit of Garima distribution and the fit has been compared with Sujatha, Aradhana, Akash, Shanker, Lindley and exponential distributions. The goodness of fit of the Garima distribution shows that it is an important model for modeling behavioral science data.
NOTE: The paper is named in loving memory of my niece Garima Satypriya, daughter of my respected brother Prof. Uma Shanker, Department of Mathematics, K.K College of Engineering & Management, Biharsharif, Nalanda, India.
None.
None.
©2016 Shanker. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7