Submit manuscript...
eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 4 Issue 7

Garima distribution and its application to model behavioral science data

Rama Shanker

Department of Statistics, Eritrea Institute of Technology, Eritrea

Correspondence: Rama Shanker, Department of Statistics, Eritrea Institute of Technology, Asmara, Eritrea

Received: September 30, 2016 | Published: December 9, 2016

Citation: Shanker R. Garima distribution and its application to model behavioral science data. Biom Biostat Int J. 2016;4(7):275-281. DOI: 10.15406/bbij.2016.04.00116

Download PDF

Abstract

In this paper a continuous distribution named “Garima distribution” has been suggested for modeling data from behavioral science. The important properties including its shape, moments, skewness, kurtosis, hazard rate function, mean residual life function, stochastic ordering, mean deviations, order statistics, Bonferroni and Lorenz curves, entropy measure, stress-strength reliability have been discussed. The condition under which Garima distribution is over-dispersed, equi-dispersed, and under-dispersed are presented along with other one parameter continuous distributions. The estimation of its parameter has been discussed using maximum likelihood estimation and method of moments. The application of the proposed distribution has been explained using a numerical example from behavioral science and the fit has been compared with other one parameter continuous distributions.

Keywords: lifetime distribution, moments, hazard rate function, mean residual life function, mean deviations, order statistics, estimation of parameter, goodness of fit

Introduction

The modeling and analyzing lifetime data are crucial in many applied sciences including behavioral science, medicine, engineering, insurance and finance, amongst others.  There are a number of continuous distributions for modeling lifetime data such as exponential, Lindley, gamma, lognormal, and Weibull and their generalizations. The exponential, Lindley and the Weibull distributions are more popular than the gamma and the lognormal distributions because the survival functions of the gamma and the lognormal distributions cannot be expressed in closed forms and both require numerical integration. Though each of exponential and Lindley distributions has one parameter, the Lindley distribution has one advantage over the exponential distribution that the exponential distribution has constant hazard rate whereas the Lindley distribution has monotonically decreasing hazard rate.

Recently Shanker14 has introduced new lifetime distributions, namely Shanker, Akash, Aradhana, and Sujatha distributions for modeling lifetime data from biomedical sciences, engineering and behavioral sciences and showed its superiority over Lindley5 and exponential distributions. The probability density function (p.d.f.) and the cumulative distribution function (c.d.f.) of Sujatha, Aradhana, Akash, Shanker, Lindley and exponential distributions are presented in Table 1.

Distributions

Pdf

Cdf

Sujatha

f6(x;θ)=θ3θ2+θ+2(1+x+x2)eθxf6(x;θ)=θ3θ2+θ+2(1+x+x2)eθx

F6(x,θ)=1[1+θx(θx+θ+2)θ2+θ+2]eθxF6(x,θ)=1[1+θx(θx+θ+2)θ2+θ+2]eθx

Aradhana

f5(x;θ)=θ3θ2+2θ+2(1+x)2eθxf5(x;θ)=θ3θ2+2θ+2(1+x)2eθx

F5(x;θ)=1[1+θx(θx+2θ+2)θ2+2θ+2]eθxF5(x;θ)=1[1+θx(θx+2θ+2)θ2+2θ+2]eθx

Akash

f4(x;θ)=θ3θ2+2(1+x2)eθxf4(x;θ)=θ3θ2+2(1+x2)eθx

F4(x;θ)=1[1+θx(θx+2)θ2+2]eθxF4(x;θ)=1[1+θx(θx+2)θ2+2]eθx

Shanker

f3(x;θ)=θ2θ2+1(θ+x)eθxf3(x;θ)=θ2θ2+1(θ+x)eθx

F3(x,θ)=1(θ2+1)+θxθ2+1eθxF3(x,θ)=1(θ2+1)+θxθ2+1eθx

Lindley

f2(x;θ)=θ2θ+1(1+x)eθx

F2(x;θ)=1[1+θxθ+1]eθx

Exponential

f1(x;θ)=θeθx

F1(x;θ)=1eθx

Table 1 pdf and cdf of Sujatha,4 Aradhana,3 Akash,2 Shanker,1 Lindley5 and exponential distributions

A new lifetime distribution

The probability density function (p.d.f.) of a new lifetime distribution can be introduced as

f7(x;θ)=θθ+2(1+θ+θx)eθx;x>0,θ>0                          (2.1)                                          

 We would call this distribution, “Garima distribution”. This distribution can be easily expressed as a mixture of exponential (θ)  and gamma (2,θ)  with mixing proportion θ+1θ+2 . We have

f7(x,θ)=pg1(x)+(1p)g2(x)                                             (2.2)

where p=θ+1θ+2,g1(x)=θeθx,andg2(x)=θ2xeθx .

The corresponding cumulative distribution function (c.d.f.) of (2.1) is given by         

F7(x;θ)=1[1+θxθ+2]eθx ;       x>0,θ>0                               (2.3)

The graphs of the p.d.f. and the c.d.f. of Garima distributions for different values of θ are shown in Figure 1.

  • Figure 1 Graphs of the pdf and cdf of Garima distribution for various values of the parameter θ.

    Moments and related measures

    The r the moment about origin of Garima distributon (2.1) has been obtained as

    μr=r!(θ+r+2)θr(θ+2);r=1,2,3,...

    and so the first four moments about origin as

    μ1=θ+3θ(θ+2) ,         μ2=2(θ+4)θ2(θ+2) ,          μ3=6(θ+5)θ3(θ+2) ,         μ4=24(θ+6)θ4(θ+2)

    Using the relationship between central moments and the moments about origin, the central moments of Garima distribution are obtained as

    μ2=θ2+6θ+7θ2(θ+2)2

    μ3=2(θ3+9θ2+21θ+15)θ3(θ+2)3

    μ4=3(3θ4+36θ3+134θ2+204θ+111)θ4(θ+2)4

    Thus the coefficient of variation (C.V) , coefficient of skewness (β1) , coefficient of kurtosis (β2) and index of dispersion (γ)  of Garima distribution are obtained as

    C.V=σμ1=θ2+6θ+7θ+3

    β1=μ3μ23/2=2(θ3+9θ2+21θ+15)(θ2+6θ+7)3/2

    β2=μ4μ22=3(3θ4+36θ3+134θ2+204θ+111)(θ2+6θ+7)2

    γ=σ2μ1=θ2+6θ+7θ(θ+2)(θ+3)

    The condition under which Garima distribution is over-dispersed (μ<σ2) , equi-dispersed (μ=σ2) and under-dispersed (μ>σ2) are presented in Table 2 along with other lifetime distributions.

    Lifetime
    Distributions

    Over-Dispersion (μ<σ2)

    Equi-Dispersion
    (μ=σ2)

    Under-Dispersion
    (μ>σ2)

    Garima

    θ<1.164247938

    θ=1.164247938

    θ>1.164247938

    Sujatha

    θ<1.364271174

    θ=1.364271174

    θ>1.364271174

    Aradhana

    θ<1.283826505

    θ=1.283826505

    θ>1.283826505

    Akash

    θ<1.515400063

    θ=1.515400063

    θ>1.515400063

    Shanker

    θ<1.171535555

    θ=1.171535555

    θ>1.171535555

    Lindley

    θ<1.170086487

    θ=1.170086487

    θ>1.170086487

    Exponential

    θ<1

    θ=1

    θ>1

    Table 2 Over-dispersion, equi-dispersion and under-dispersion of Garima, Sujatha,4 Aradhana,3 Akash,2 Shanker,1 Lindley,5 and exponential distributions for varying values of their parameter θ

    Generating functions

    The moment generating function (MX(t)) , characteristic function (φX(t)) , and cumulant generating function (KX(t)) of Garima distribution (1.3) are given by

    MX(t)=(1(θ+1)tθ2+2θ)(1tθ)2,|tθ|1

                     

    φX(t)=(1(θ+1)itθ2+2θ)(1itθ)2,i=1

    KX(t)=log(1(θ+1)itθ2+2θ)2log(1itθ)

    Using the expansion log(1x)=r=0xrr , we get

    KX(t)=r=0(θ+1θ2+2θ)r(it)rr+2r=0(itθ)rr

    =2r=01θr(it)rrr=0(θ+1θ2+2θ)r(it)rr

    =2r=0(r1)!θr(it)rr!r=0(θ+1θ2+2θ)r(r1)!(it)rr!

    Thus the r th cumulant of Garima distribution is given by

    Kr = coefficient of (it)rr! in KX(t)                      

    =2(r1)!θr(r1)!(θ+1)r(θ2+2θ)r;r=1,2,3,...

    This gives

    μ1=K1=θ+3θ(θ+2)

    μ2=K2=θ2+6θ+7θ2(θ+2)2

    μ3=K3=2(θ3+9θ2+21θ+15)θ3(θ+2)3

    μ4=K4+3K22=3(3θ4+36θ3+134θ2+204θ+111)θ4(θ+2)4

    Which the same are as obtained earlier.

    Hazard rate function and mean residual life function

    Let X be a continuous random variable with pdf f(x) and cdf F(x) . The hazard rate function (also known as the failure rate function) and the mean residual life function of X are respectively defined as

    h(x)=limΔx0P(X<x+Δx|X>x)Δx=f7(x;θ)1F7(x;θ)                                         (5.1)

    and  m(x)=E[Xx|X>x]=11F7(x;θ)x[1F7(t;θ)]dt                              (5.2)

    The hazard rate function, h(x) and the mean residual life function, m(x)  of Garima distribution are given by

    h(x)=θ(1+θ+x)θx+(θ+2)                                                                   (5.3)

    and                    m(x)=θx+θ+3θ(θx+θ+2)                                                                 (5.4)

    It can be easily verified that h(0)=θ(θ+1)θ+2=f(0) and m(0)=θ+3θ(θ+2)=μ1 .It is also obvious from the graphs of h(x)  and m(x) that h(x)  is an increasing or decreasing function of x , and θ , where as m(x) is a decreasing function of x , and θ . The graph of the hazard rate function and mean residual life function of Garima distribution are shown in Figures 2 & 3.         

  • Figure 2 Graph of hazard rate function of Garima distribution for different values of parameter θ.

  • Figure 3 Graph of mean residual life function of Garima distribution for different values of parameter θ.

    Stochastic orderings

    Stochastic ordering of positive continuous random variables is an important tool for judging their comparative behavior. A random variable X is said to be smaller than a random variable Y in the

    1. stochastic order (XstY) if FX(x)FY(x) for all x
    2. hazard rate order (XhrY) if hX(x)hY(x)  for all
    3. mean residual life order (XmrlY) if mX(x)mY(x) for all x
    4. likelihood ratio order (XlrY) if fX(x)fY(x)  decreases in x .

    The following results due to Shaked & Shanthikumar [6] are well known for establishing stochastic ordering of distributions

    XlrYXhrYXmrlY                                            (6.1)

    XstY

    The Garima distribution is ordered with respect to the strongest ‘likelihood ratio’ ordering as shown in the following theorem:

    1. Theorem: Let X  Garima distributon (θ1)  and Y  Garima distribution (θ2) . If θ1θ2 , then XlrY and hence XhrY , XmrlY and XstY .
    2. Proof: We have

    fX(x)fY(x)=θ1(θ2+2)θ2(θ1+2)(1+θ1+θ1x1+θ2+θ2x)e(θ1θ2)x   ; x>0

    Now

    logfX(x)fY(x)=log[θ1(θ2+2)θ2(θ1+2)]+log(1+θ1+θ1x1+θ2+θ2x)(θ1θ2)x

    This gives        ddxlogfX(x)fY(x)=θ1θ2(1+θ1+θ1x)(1+θ2+θ2x)(θ1θ2)

     Thus for θ1θ2 , ddxlogfX(x)fY(x)<0 . This means that XlrY and hence XhrY , XmrlY and XstY .

    Mean deviations

    The amount of scatter in a population is measured to some extent by the totality of deviations usually from mean and median. These are known as the mean deviation about the mean and the mean deviation about the median defined by

    δ1(X)=0|xμ|f(x)dx and δ2(X)=0|xM|f(x)dx , respectively, where μ=E(X)  and M=Median (X) . The measures δ1(X)  and δ2(X) can be calculated using the relationships

    δ1(X)=μ0(μx)f(x)dx+μ(xμ)f(x)dx

    =μF(μ)μ0xf(x)dxμ[1F(μ)]+μxf(x)dx

    =2μF(μ)2μ+2μxf(x)dx

    =2μF(μ)2μ0xf(x)dx                                                            (7.1)

    and

    δ2(X)=M0(Mx)f(x)dx+M(xM)f(x)dx

    =MF(M)M0xf(x)dxM[1F(M)]+Mxf(x)dx

    =μ+2Mxf(x)dx

    =μ2M0xf(x)dx                                                                    (7.2)

    Using p.d.f. (2.1) and expression for the mean of Garima distribution, we get

    μ0xf7(x;θ)dx=μ{θ2μ2+(θ2+3θ)μ+(θ+3)}eθμθ(θ+2)                            (7.3)

    M0xf7(x;θ)dx=μ{θ2M2+(θ2+3θ)M+(θ+3)}eθMθ(θ+2)                            (7.4)

    Using expressions from (7.1), (7.2), (7.3), and (7.4), the mean deviation about mean, δ1(X)  and the mean deviation about median, δ2(X)  of Garima distribution are obtained as

    δ1(X)=(2θμ+θ+3)eθμθ(θ+2)                                                                        (7.5)

    δ2(X)=2{θ2M2+(θ2+3θ)M+(θ+3)}eθMθ(θ+2)μ                                   (7.6)

    Order statistics

    Let X1,X2,...,Xn  be a random sample of size n  from Garima distribution (2.1). Let X(1)<X(2)<...<X(n) denote the corresponding order statistics. The p.d.f. and the c.d.f. of the k th order statistic, say Y=X(k) are given by

    fY(y)=n!(k1)!(nk)!Fk1(y){1F(y)}nkf(y)

    =n!(k1)!(nk)!nkl=0(nkl)(1)lFk+l1(y)f(y)

    and

    FY(y)=nj=k(nj)Fj(y){1F(y)}nj

    =nj=knjl=0(nj)(njl)(1)lFj+l(y) ,

    respectively, for k=1,2,3,...,n .

     Thus,  the p.d.f. and the c.d.f of k th  order statistics of Garima distribution are given by

    fY(y)=n!θ(1+θ+θx)eθx(θ+2)(k1)!(nk)!nkl=0(nkl)×[1θx+(θ+2)θ+2eθx]k+l1

    and

    FY(y)=nj=knjl=0(nj)(njl)(1)l[1θx+(θ+2)θ+2eθx]j+l

    Bonferroni and lorenz curves

    The Bonferroni and Lorenz curves7 and Bonferroni and Gini indices have applications not only in economics to study income and poverty, but also in other fields like reliability, demography, insurance and medicine. The Bonferroni and Lorenz curves are defined as

    B(p)=1pμq0xf(x)dx=1pμ[0xf(x)dxqxf(x)dx]=1pμ[μqxf(x)dx]        (9.1)

    and L(p)=1μq0xf(x)dx=1μ[0xf(x)dxqxf(x)dx]=1μ[μqxf(x)dx]          (9.2)

    respectively or equivalently

    B(p)=1pμp0F1(x)dx                                                                      (9.3)

    and    L(p)=1μp0F1(x)dx                                                                       (9.4)

    respectively, where μ=E(X) and q=F1(p) .

    The Bonferroni and Gini indices are thus defined as

    B=110B(p)dp                                                                        (9.5)

    and G=1210L(p)dp                                                                (9.6)

    respectively.

    Using p.d.f. (2.1), we get

    qxf7(x;θ)dx={θ2q2+(θ2+3θ)q+(θ+3)}eθqθ(θ+2)                                                    (9.7)

    Now using equation (8.7) in (8.1) and (8.2), we get

    B(p)=1p[1{θ2q2+(θ2+3θ)q+(θ+3)}eθqθ+3]                                                (9.8)

    and  L(p)=1{θ2q2+(θ2+3θ)q+(θ+3)}eθqθ+3                                                        (9.9)

    Now using equations (9.8) and (9.9) in (9.5) and (9.6), the Bonferroni and Gini indices of Garima distribution (2.1) are obtained as

    B=1{θ2q2+(θ2+3θ)q+(θ+3)}eθqθ+3                                                       (9.10)

    G=1+2{θ2q2+(θ2+3θ)q+(θ+3)}eθqθ+3                                                   (9.11)

    Renyi entropy

    Entropy of a random variable X is a measure of variation of uncertainty. A popular entropy measure is Renyi entropy [8]. If X is a continuous random variable having probability density function f(.) , then Renyi entropy is defined as

    TR(γ)=11γlog{fγ(x)dx}

    where γ>0andγ1 .

    Thus, the Renyi entropy for the Garima distribution (2.1) is obtained as

    TR(γ)=11γlog[0θγ(θ+2)γ(1+θ+θx)γeθγxdx]

    =11γlog[0θγ(1+θ)γ(θ+2)γ(1+θθ+1x)γeθγxdx]

    =11γlog[0θγ(1+θ)γ(θ+2)γj=0(γj)(θθ+1x)jeθγxdx]

    =11γlog[j=0(γj)θγ+j(1+θ)γj(θ+2)γ0eθγxxj+11dx]

    =11γlog[j=0(γj)θγ+j(1+θ)γj(θ+2)γΓ(j+1)(θγ)j+1]

    =11γlog[j=0(γj)θγ1(θ+2)γΓ(j+1)(γ)j+1]

    Stress-strength reliability

    The stress- strength reliability describes the life of a component which has random strength X that is subjected to a random stress Y . When the stress applied to it exceeds the strength, the component fails instantly and the component will function satisfactorily till X>Y . Therefore, R=P(Y<X) is a measure of component reliability and in statistical literature it is known as stress-strength parameter. It has wide applications in almost all areas of knowledge especially in engineering such as structures, deterioration of rocket motors, static fatigue of ceramic components, aging of concrete pressure vessels etc.

    Let X and Y be independent strength and stress random variables having Garima distribution (2.1) with parameter θ1  and θ2  respectively. Then the stress-strength reliability R of Garima distribution can be obtained as

    R=P(Y<X)=0P(Y<X|X=x)fX(x)dx

    =0f7(x;θ1)F7(x;θ2)dx

    =1θ1[(θ1+θ2)2(θ1θ2+2θ1+θ2+1)+2θ1θ2(θ1+θ2)+2θ1θ2](θ1+2)(θ2+2)(θ1+θ2)3 .

    Estimation of parameter

    Maximum likelihood estimates (MLE)

    Let (x1,x2,x3,...,xn)  be a random sample from Garima distribution (2.1). The likelihood function, L of (2.1) is given by

    L=(θθ+2)nni=1(1+θ+θxi)enθˉx

    The natural log likelihood function is thus obtained as

    lnL=nln(θθ+2)+ni=1ln(1+θ+θxi)nθˉx

    Now        dlnLdθ=2nθ2+2θ+ni=11+xi1+θ+θxinˉx=0

    where ˉx is the sample mean.

    The maximum likelihood estimate, ˆθ  of θ  is the solution of the equation dlogLdθ=0  and so it can be obtained by solving the following non-linear equation

    ni=11+xi1+θ+θxi+2nθ2+2θnˉx=0                                          (12.1.1)

    Method of moment estimates (MOME)

    Equating the population mean of the Garima distribution to the corresponding sample mean, the method of moment estimate (MOME) ˜θ , of θ can be obtained as

    ˜θ=(12ˉx)+4ˉx2+8ˉx+12ˉx;ˉx>0                  (12.2.1)

    A numerical example

    In this section the goodness of fit of the Garima distribution has been discussed with an example from behavioral science. The data is related with behavioral sciences, collected by Balakrishnan N et al. [9]. The scale “General Rating of Affective Symptoms for Preschoolers (GRASP)” measures behavioral and emotional problems of children, which can be classified with depressive condition or not according to this scale. A study conducted by the authors in a city located at the south part of Chile has allowed collecting real data corresponding to the scores of the GRASP scale of children with frequency in parenthesis, which are:

    19(6)

    20(15)

    21(14)

    22(9)

    23(12)

    24(10)

    25(6)

    26(9)

    27(8)

    28(5)

    29(6)

    30(4)

    31(3)

    32(4)

    33

    34

    35(4)

    36(2)

    37(2)

    39

    42

    44

    In order to compare distributions, 2lnL , AIC (Akaike Information Criterion), AICC (Akaike Information Criterion Corrected), BIC (Bayesian Information Criterion),K-S Statistics ( Kolmogorov-Smirnov Statistics)  for above data set have been computed and presented in Table 3.  The formulae for computing AIC, AICC, and BIC are as follows:

    AIC=2lnL+2k ,    AICC=AIC+2k(k+1)(nk1) ,    BIC=2lnL+klnn  

    The best distribution is the distribution which corresponds to lower values of 2lnL , AIC, AICC, and BIC.

    It can be easily seen from above table that the Garima distribution is better than Aradhana, Sujatha, Akash, Shanker, Lindley and exponential distributions  for modeling behavioral science data and thus Garima distribution should be preferred over Aradhana, Sujatha, Akash, Shanker, Lindley and exponential distributions  for modeling behavioral science data.

    Model

    ML Estimate

    2lnL

    AIC

    AICC

    BIC

    Garima

    0.05317

    188.32

    190.32

    190.35

    193.23

    Aradhana

    0.11557

    989.49

    991.49

    991.52

    994.40

    Sujatha

    0.11745

    985.69

    987.69

    987.72

    990.60

    Akash

    0.11961

    981.28

    983.28

    983.31

    986.18

    Shanker

    0.07974

    1033.10

    1035.10

    1035.13

    1037.99

    Lindley

    0.07725

    1041.64

    1043.64

    1043.68

    1046.54

    Exponential

    0.04006

    1130.26

    1132.26

    1132.29

    1135.16

    Table 3 MLE’s,-2ln L, AIC, AICC, and BIC of Garima, Aradhana, Sujatha [4], Akash [2], Shanker [1], Lindley [5] and exponential distributions

    Conclusion

     A one parameter lifetime distribution named, “Garima distribution” has been proposed and studied. Its mathematical properties including shape, moments, skewness, kurtosis, hazard rate function, mean residual life function, stochastic ordering, mean deviations, order statistics, Bonferroni and Lorenz curves, Renyi entropy and stress-strength reliability  have been discussed. The condition under which Garima distribution is over-dispersed, equi-dispersed, and under-dispersed are presented along with the conditions under which Sujatha, Aradhana, Akash, Shanker, Lindley and exponential distributions are over-dispersed, equi-dispersed and under-dispersed. The method of moments and the method of maximum likelihood estimation have also been discussed for estimating its parameter. Finally, a numerical example from behavioral science has been considered for the goodness of fit of Garima distribution and the fit has been compared with Sujatha, Aradhana, Akash, Shanker, Lindley and exponential distributions. The goodness of fit of the Garima distribution shows that it is an important model for modeling behavioral science data. 

    NOTE: The paper is named in loving memory of my niece Garima Satypriya, daughter of my respected brother Prof. Uma Shanker, Department of Mathematics, K.K College of Engineering & Management, Biharsharif, Nalanda, India.

    Acknowledgments

    None.

    Conflicts of interest

    None.

    References

    Creative Commons Attribution License

    ©2016 Shanker. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.