Research Article Volume 11 Issue 3
1Department of Statistics, Assam University, Silchar, Assam, India
2Department of Mathematics, Noida International University, Gautam Buddh Nagar, India
Correspondence: Rama Shanker, Department of Statistics, Assam University, Silchar, Assam, India
Received: July 20, 2022 | Published: August 17, 2022
Citation: Shanker R, Shukla KK. The Poisson-Adya distribution. Biom Biostat Int J. 2022;11(3):100-103. DOI: 10.15406/bbij.2022.11.00361
In this paper a Poisson mixture of Adya distribution called Poisson-Adya distribution has been suggested. The expressions of statistical constants including coefficients of variation, skewness, kurtosis and index of dispersion have been obtained and their behavior for varying values of parameter has been studied. It is observed that the obtained distribution is unimodal, has increasing hazard rate and over-dispersed. Maximum likelihood estimation and method of moment have been discussed for estimating parameter. Finally, the goodness of fit of the proposed distribution and its comparison with Poisson and Poisson-Lindley distributions has been given.
Keywords: Adya distribution, compounding, unimodality, over-dispersion, estimation, goodness of fit
The Poisson distribution is a suitable distribution for data having equi-dispersion (mean equal to variance). But in real life situation, it has been observed that most of the datasets being stochastic in nature are either over-dispersed (variance greater than mean) or under-dispersed (variance less than mean). During recent decades an attempt has been made by different researchers to derive over-dispersed one parameter discrete distribution by compounding Poisson distribution with one parameter continuous lifetime distributions. A popular one parameter discrete distribution for over-dispersed (variance greater than the mean) is the Poisson-Lindley distribution (PLD) proposed by Sankaran1. PLD is a Poisson mixture of Lindley distribution introduced by Lindley2. Further, it has been observed that these one parameter discrete distributions are not suitable for some over-dispersed datasets from biological sciences due to their levels of over-dispersion. Shanker & Hagos3 have detailed discussion on applications of PLD for data arising from biological sciences, as the data from biological sciences are, in general, over-dispersed. It has been observed by Shanker & Hagos3 that in some biological science data PLD does not give better fit and hence there is a need for another over-dispersed discrete distribution is required.
Shanker, et al4 proposed a one parameter continuous lifetime distribution named Adya distribution, defined by its probability density function (pdf) and cumulative density function (cdf) given by
f(x;θ)=θ3θ4+2θ2+2(θ+x)2e−θ x ;x>0, θ>0f(x;θ)=θ3θ4+2θ2+2(θ+x)2e−θx;x>0,θ>0 (1.1)
F(x,θ)=1−[1+θx(θx+2θ2+2)θ4+2θ2+2]e−θ x ;x>0,θ>0F(x,θ)=1−[1+θx(θx+2θ2+2)θ4+2θ2+2]e−θx;x>0,θ>0 (1.2)
Shanker, et al4 derived Adya distribution as a convex combination of exponential (θ)(θ) , gamma (2,θ)(2,θ) and gamma (3,θ)(3,θ) distributions with respective proportions θ4θ4+2θ2+2θ4θ4+2θ2+2 , 2θ4θ4+2θ2+22θ4θ4+2θ2+2 and 2θ4θ4+2θ2+2 2θ4θ4+2θ2+2 respectively. Its various statistical properties including moments and moments-based measures, hazard rate function, mean residual life function, stochastic ordering, deviations from the mean and the median, Bonferroni and Lorenz curves, and stress-strength reliability, estimation of parameter and applications are available in Shanker et al4.
In the present paper a Poisson mixture of Adya distribution has been derived and its statistical constants including coefficients of variation, skewess, kurtosis and index of dispersion have been studied. The Unimodality, increasing hazard rate and over-dispersion of the distribution have been explained. Estimation of parameter using method of moment and maximum likelihood has been discussed. Applications, goodness of fit and its comparison with other one parameter discrete distributions are presented.
Let XX follows Poisson distribution with parameter λ>0λ>0 having pmf
P(X|λ)=e−λλxx!;x=0,1,2,...P(X|λ)=e−λλxx!;x=0,1,2,...Now suppose the parameter λλ follows Adya distribution with parameter θθ having pdf
f(λ|θ)=θ3θ4+2θ2+2(θ+λ)2e−θ λ;λ>0,θ>0f(λ|θ)=θ3θ4+2θ2+2(θ+λ)2e−θλ;λ>0,θ>0Thus, the marginal pmf of XX can be obtained as
P(X=x)=∞∫0P(X|λ)f(λ|θ)dλ=∞∫0e−λλxx!θ3θ4+2θ2+2(θ+λ)2e−θ λdλP(X=x)=∞∫0P(X|λ)f(λ|θ)dλ=∞∫0e−λλxx!θ3θ4+2θ2+2(θ+λ)2e−θλdλ (2.1)
=θ3(θ4+2θ2+2)x!∞∫0e−(θ+1)λλx(θ2+2θ λ+λ2) dλ=θ3(θ4+2θ2+2)x!∞∫0e−(θ+1)λλx(θ2+2θλ+λ2)dλ=θ3(θ4+2θ2+2)x2+(2θ2+2θ+3)x+(θ4+2θ3+3θ2+2θ+2)(θ+1)x+3;x=0,1,2,...,θ>0=θ3(θ4+2θ2+2)x2+(2θ2+2θ+3)x+(θ4+2θ3+3θ2+2θ+2)(θ+1)x+3;x=0,1,2,...,θ>0 (2.2)
We name this distribution as Poisson-Adya distribution. In the subsequent sections it has been shown that the pmf of Poisson-Adya distribution (PAD) is unimodal, has increasing hazard rate and over-dispersed. The nature of the pmf of PAD for varying values of parameter has been shown in the following figure1. As the value of parameter increases, the distribution becomes positively skewed and also it is becoming more over-dispersed (Figure 1).
Using (2.1), r the th factorial moment about origin,μ(r)′ , of PAD can be obtained as
μ(r)′=E[E(X(r)|λ)]=θ3θ4+2θ2+2∞∫0[∞∑x=0x(r)e−λλxx!](θ+λ)2e−θ λdλ .
=θ3θ4+2θ2+2∞∫0λr[∞∑x=re−λλx−r(x−r)!](θ2+2θ λ+λ2)e−θ λdλ =θ3θ4+2θ2+2∞∫0λr(θ2+2θ λ+λ2)e−θ λdλ =r!{θ4+2(r+1)θ+(r+1)(r+2)}θr(θ4+2θ2+2);r=1,2,3,...Substituting r=1,2,3,&4 the first four factorial moment about origin, of PAD can be obtained as
μ(1)′=θ4+4θ2+6θ(θ4+2θ2+2) ,μ(2)′=2(θ4+6θ2+12)θ2(θ4+2θ2+2)
μ(3)′=6(θ4+8θ2+20)θ3(θ4+2θ2+2) ,μ(4)′=24(θ4+10θ2+30)θ4(θ4+2θ2+2) .
The relationship between moments about origin and factorial moments about origin gives the following four moments about originμ1′=θ4+4θ2+6θ(θ4+2θ2+2)
μ2′=θ5+2θ4+4θ3+12θ2+6θ+24θ2(θ4+2θ2+2) μ3′=θ6+6θ5+10θ4+36θ3+54θ2+72θ+120θ3(θ4+2θ2+2) μ4′=θ7+14θ6+40θ5+108θ4+294θ3+408θ2+720θ+720θ4(θ4+2θ2+2)Using the relationship between moments about mean and the moments about origin, moments about the mean are obtained asμ2=θ9+θ8+6θ7+8θ6+16θ5+24θ4+20θ3+24θ2+12θ+12θ2(θ4+2θ2+2)2 .
μ3=(θ14+3θ13+10θ12+30θ11+54θ10+126θ9+172θ8+264θ7+284θ6+324θ5+280θ4+216θ3+168θ2+72θ+48)θ3(θ4+2θ2+2)3 μ4=(θ19+10θ18+28θ17+129θ16+300θ15+796θ14+1628θ13+2952θ12+4952θ11+6968θ10+9624θ9+11048θ8+12368θ7+11952θ6+10544θ5+8544θ4+5520θ3+3648θ2+1440θ+720)θ4(θ4+2θ2+2)4Now, the descriptive measures of PAD including coefficient of variation (C.V), skewness, kurtosis and index of dispersion are obtained asC.V=σμ1′=√θ9+θ8+6θ7+8θ6+16θ5+24θ4+20θ3+24θ2+12θ+12θ4+4θ2+6
√β1=μ3(μ2)3/2=(θ14+3θ13+10θ12+30θ11+54θ10+126θ9+172θ8+264θ7+284θ6+324θ5+280θ4+216θ3+168θ2+72θ+48)(θ9+θ8+6θ7+8θ6+16θ5+24θ4+20θ3+24θ2+12θ+12)3/2 β2=μ4μ22=(θ19+10θ18+28θ17+129θ16+300θ15+796θ14+1628θ13+2952θ12+4952θ11+6968θ10+9624θ9+11048θ8+12368θ7+11952θ6+10544θ5+8544θ4+5520θ3+3648θ2+1440θ+720)(θ9+θ8+6θ7+8θ6+16θ5+24θ4+20θ3+24θ2+12θ+12)2 γ=σ2μ1′=θ9+θ8+6θ7+8θ6+16θ5+24θ4+20θ3+24θ2+12θ+12θ(θ4+2θ2+2)(θ4+4θ2+6)The nature of coefficients of variation, skewness, kurtosis and index of dispersion of PAD for varying values of parameter are shown in the following Figure 2. It is obvious that the coefficient of variation, skewness, kurtosis and index of dispersion are all increasing for increasing values of parameter (Figure 2).
We have
μ2=θ9+θ8+6θ7+8θ6+16θ5+24θ4+20θ3+24θ2+12θ+12θ2(θ4+2θ2+2)2 =θ4+4θ2+6θ(θ4+2θ2+2)[θ9+θ8+6θ7+8θ6+16θ5+24θ4+20θ3+24θ2+12θ+12θ(θ4+2θ2+2)(θ4+4θ2+6)] =θ4+4θ2+6θ(θ4+2θ2+2)[1+θ8+8θ6+24θ4+24θ2+12θ(θ4+2θ2+2)(θ4+4θ2+6)] =μ1′[1+θ8+8θ6+24θ4+24θ2+12θ(θ4+2θ2+2)(θ4+4θ2+6)]This shows that μ2>μ1′ and thus PAD is always over-dispersed distribution. Therefore, PAD can be used for discrete data sets which are over-dispersed in nature.
Increasing Hazard Rate and Unimodality
It can be easily shown that PAD has increasing hazard rate (IHR) and is unimodal. Since
P(x+1,θ)P(x,θ)=1θ+1[1+2{x+(θ2+θ+2)}x2+(2θ2+2θ+3)x+(θ4+2θ3+3θ2+2θ+2)]is a decreasing function of x for a given θ ,P(x,θ) is log-concave. This implies that PAD has an increasing hazard rate and is unimodal. Grandell5 has detailed discussion about relationship between log-concavity, IHR and Unimodality of discrete distributions.
Method of moment estimate
Let x1,x2,...,xn be a random sample of size n from PAD. Equating the first moment about origin to the corresponding sample moment, the MOME ˜θ of θ is the solution of the following fifth degree polynomial equation
ˉxθ5−θ4+2ˉxθ3−4θ2+2ˉxθ−6=0 , where ˉx is the sample mean.
This equation can be solved using Newton-Raphson method to get the estimate of the parameter.
Maximum Likelihood Estimate
Let be a random sample of size n from PAD and let fx be the observed frequency in the sample corresponding to X=x (x=1,2,3,...,k) such that k∑x=1fx=n , where k is the largest observed value having non-zero frequency. The likelihood function L of PAD is given by
L=(θ4θ4+2θ2+2)n1(θ+1)k∑x=1fx(x+3)k∏x=1[x2+(2θ2+2θ+3)x+(θ4+2θ3+3θ2+2θ+2)]fx
The log likelihood function is obtained as
logL=nlog(θ4θ4+2θ2+2)−k∑x=1fx(x+3)log(θ+1) +k∑x=1fxlog[x2+(2θ2+2θ+3)x+(θ4+2θ3+3θ2+2θ+2)]The first derivative of the log likelihood function is given by
dlogLdθ=12nθ−4n(θ3+θ)θ4+2θ2+2−n(ˉx+3)θ+1+k∑x=1[(4θ+2)x+(4θ3+6θ2+6θ+2)]fxx2+(2θ2+2θ+3)x+(θ4+2θ3+3θ2+2θ+2),
where ˉx is the sample mean.
The maximum likelihood estimate (MLE), ˜θ of θ is the solution of the equation dlogLdθ=0 and is given by the solution of the non-linear equation
12nθ−4n(θ3+θ)θ4+2θ2+2−n(ˉx+3)θ+1+k∑x=1[(4θ+2)x+(4θ3+6θ2+6θ+2)]fxx2+(2θ2+2θ+3)x+(θ4+2θ3+3θ2+2θ+2)=0 Since this log-likelihood equation cannot be expressed in closed form, it may be difficult to solve it by direct method. Therefore, the MLE of the parameter θ can be computed iteratively by solving log-likelihood equation using Newton-Raphson iteration available in R-software, until sufficiently close values of the parameter θ is obtained. The initial value of the parameter θ can be taken as the value given by method of moment estimate.
In this section, the applications of PAD have been discussed for three count datasets which are over-dispersed. The goodness of fit of PAD has been compared with Poisson and PLD. The pmf of PLD is given byP(x,θ)=θ2(x+θ+2)(θ+1)x+3 ;x=0,1,2,...,θ>0
The expected values given by Poisson, PLD and PAD are given in the table for ready comparison. It is very clear from the goodness of fit presented in tables 1, 2, and 3 that PAD provides a better fit over Poisson and PLD (Tables 1-3).
No. of errors per group |
Observed frequency |
Expected frequency |
||
PD |
PLD |
PAD |
||
0 |
35 |
27.4 |
33 |
33.1 |
1 |
11 |
21.5 |
15.3 |
15.2 |
2 |
8 |
|
|
6.7 |
3 |
4 |
2.2 |
2.9 |
|
4 |
2 |
0.5 |
2.0 |
2.9 |
Total |
60 |
60 |
60 |
60 |
ML estimate |
|
ˆθ=0.7833 |
ˆθ=1.7434 |
ˆθ=1.9141 |
χ2 |
7.98 |
2.20 |
1.72 |
|
d.f. |
1 |
1 |
2 |
|
p-value |
0.0047 |
0.1380 |
0.4232 |
Table 1 Distribution of mistakes in copying groups of random digits, available in Kemp and Kemp6
No. of chromatid aberrations |
Observed frequency |
Expected frequency |
||
|
PLD |
PAD |
||
0 |
268 |
231.3 |
257 |
258.1 |
1 |
87 |
126.7 |
93.4 |
92.5 |
2 |
26 |
34.7 |
32.8 |
32.4 |
3 |
9 |
|
11.2 |
11.2 |
4 |
4 |
0.8 |
|
|
5 |
2 |
0.1 |
1.2 |
1.3 |
6 |
1 |
0.1 |
0.4 |
0.4 |
7+ |
3 |
0.1 |
0.2 |
0.4 |
Total |
400 |
400 |
400 |
400 |
ML estimate |
ˆθ=0.5475 |
ˆθ=2.380442 |
ˆθ=2.4406 |
|
χ2 |
38.21 |
6.21 |
5.21 |
|
d.f. |
2 |
3 |
3 |
|
p-value |
0.0000 |
0.1018 |
0.1577 |
Table 2 Distribution of number of chromatid aberrations (0.2 g chinon 1, 24 hours), available in Loeschke & Kohler7 and Janardan & Schaeffer8
No. of accidents |
Observed frequency |
Expected frequency |
||
PD |
PLD |
PAD |
||
0 |
447 |
406 |
439.5 |
440.5 |
1 |
132 |
189 |
142.8 |
141.5 |
2 |
42 |
45 |
45 |
44.7 |
3 |
21 |
7 |
13.9 |
14 |
4 |
3 |
1 |
|
|
≥5 |
2 |
0.1 |
1.2 |
2.0 |
Total |
647 |
647 |
647 |
647 |
ML estimate |
ˆθ=0.465 |
ˆθ=2.729 |
ˆθ=2.7182 |
|
χ2 |
61.08 |
4.82 |
4.66 |
|
d.f. |
1 |
3 |
2 |
|
p-value |
0.0273 |
0.1855 |
0.1985 |
Table 3 Accidents to 647 women working on high explosive shells in 5 weeks, available in Sankaran1
In this paper a Poisson mixture of Adya distribution called Poisson-Adya distribution (PAD) has been suggested. The expressions of statistical constants including coefficients of variation, skewness, kurtosis and index of dispersion have been obtained and their behavior for varying values of parameter has been studied. It is observed that the obtained distribution is unimodal, has increasing hazard rate and over-dispersed. Maximum likelihood estimation and method of moment have been discussed for estimating parameter. Finally, the goodness of fit of the proposed distribution and its comparison with other one parameter discrete distributions including Poisson and PLD on three datasets from biological science has been presented.
None.
The authors declare no conflicts of interest.
©2022 Shanker, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7