Research Article Volume 6 Issue 3
Department of Statistics, Eritrea Institute of Technology, Eritrea
Correspondence: Rama Shanker, Department of Statistics, Eritrea Institute of Technology, Asmara, Eritrea
Received: August 23, 2017 | Published: August 28, 2017
Citation: Shanker R, Shukla KK. Size-biased poisson-garima distribution with applications. Biom Biostat Int J. 2017;6(3):335-340. DOI: 10.15406/bbij.2017.06.00167
In this paper, a size-biased Poisson-Garima distribution (SBPGD) has been obtained by size-biasing the Poisson-Garima distribution (PGD) introduced by Shanker (2017). The moments about origin and moments about mean have been obtained and hence expressions for coefficient of variation (C.V.), skewness and Kurtosis have been obtained. The estimation of its parameter using the method of moment and the method of maximum likelihood estimation has been discussed. The goodness of fit of SBPGD has been discussed for two real data sets using maximum likelihood estimate and the fit shows quite satisfactory over size-biased Poisson distribution (SBPD) and size-biased Poisson-Lindley distribution (SBPLD).
Keywords: garima distribution, poisson-garima distribution, size-biasing, moments, estimation of parameter, goodness of fit
Shanker1 has obtained Poisson-Garima distribution (PGD) for modeling count data having probability mass function (p.m.f.)
P0(x;θ)=θθ+2θx+(θ2+3θ+1)(θ+1)x+2 ; x=0,1,2,..., θ>0 (1.1)
The first four moments about origin and the variance of PGD obtained by Shanker1 are as follows:
μ′1=θ+3θ(θ+2) , μ′2=θ2+5θ+8θ2(θ+2) , μ′3=θ3+9θ2+30θ+30θ3(θ+2)
μ′4=θ4+17θ3+92θ2+204θ+144θ4(θ+2) , and μ2=θ3+6θ2+12θ+7θ2(θ+2)2
The detailed discussion about its properties, estimation of parameter, and applications has been discussed by Shanker1 and it has been shown that it is better than Poisson and Poisson-Lindley distributions for modeling count data in various fields of knowledge. The PGD arises from the Poisson distribution when its parameter λ follows Garima distribution introduced by Shanker2 having probability density function (p.d.f.)
f0(λ;θ)=θθ+2(1+θ+θ λ)e−θ λ ;λ>0, θ>0 (1.2)
Size-biased distributions arise in practice when observations from a sample having probability proportional to some measure of unit size. Fisher3 firstly introduced these distributions to model ascertainment biases which were later formalized by Rao4 in a unifying theory. Size-biased observations occur in many research areas and its fields of applications includes medical science, sociology, psychology, ecology, geological sciences etc. The applications of size-biased distribution theory in fitting distributions of diameter at breast height (DBH) data arising from horizontal point sampling (HPS) has been discussed by Van Deusen.5 Further, Lappi and Bailey6 have applied size-biased distributions to analyze HPS diameter increment data. The statistical applications of size-biased distributions to the analysis of data relating to human population and ecology can be found in Patil and Rao.7,8 Some of the recent results on size-biased distributions pertaining to parameter estimation in forestry with special emphasis on Weibull family have been discussed by Gove.9 Ducey and Gove10 discussed size-biased distributions in the generalized beta distribution family, with applications to forestry.
Let a random variable X has the original probability distribution P0(x;θ) , then a simple size-biased distribution is given by its probability function
P1(x;θ)=x⋅P0(x;θ)μ′1 (1.3)
Where μ′1=E(X) is the mean of the original probability distribution.
In the present paper, a size-biased Poisson-Garima distribution (SBPGD) has been proposed. It s raw and central moments and central moments based properties including coefficient of variation, skewness, kurtosis and index of dispersion have been obtained and discussed. Some of its statistical properties have been discussed. The method of moment and the method of maximum likelihood estimation have been discussed for estimating the parameter of SBPGD. The goodness of fit of SBPGD has also been presented.
Using (1.1) and (1.3), the p.m.f. of the size-biased Poisson-Garima distribution (SBPGD) with parameter θ can be obtained as
P1(x;θ)=x⋅P0(x;θ)μ′1=θ2θ+3x2θ+x(θ2+3θ+1)(θ+1)x+2 ;x=1,2,3,..,θ>0 (2.1)
where μ′1=θ+3θ(θ+2) is the mean of the PGD (1.1).
The SBPGD can also be obtained from the size-biased Poisson distribution (SPBD) with p.m.f.
g(x|λ)=e−λλx−1(x−1)! ;x=1,2,3,...,λ>0 (2.2)
when its parameter λ follows size-biased Garima distribution (SBGD) with p.d.f.
h(λ;θ)=θ2θ+3λ(1+θ+θλ)e−θ λ ; x>0, θ>0 (2.3)
Thus the p.m.f of SBPGD can be obtained as
P(X=x)=∞∫0g(x|λ)⋅h(λ;θ)dλ
=∞∫0e−λλx−1(x−1)!θ2θ+3λ(1+θ+θλ)e−θ λdλ (2.4)
=θ2(θ+3)(x−1)!∞∫0e−(θ+1) λ[(1+θ)λx+θλx+1]dλ
=θ2θ+3x2θ+x(θ2+3θ+1)(θ+1)x+2 ;x=1,2,3,..,θ>0
which is the p.m.f of SBPGD with parameter θ .
Graphs of SBPGD for varying values of parameter θ are shown in figure 1. It is obvious from the graphs of SBPGD that as the value of parameter θ increases, the initially the graphs shift upward and decreases fast for increasing values of x . Also the graphs become convex for values of θ≥2 .
It would be recalled that the p.m.f of size-biased Poisson-Lindley distribution (SBPLD) given by
P2(X=x)=θ3θ+2x(x+θ+2)(θ+1)x+2 ;x=1,2,3,...,;θ>0 (1.7)
has been introduced by Ghitany and Mutairi,11 which is a size-biased version of Poisson-Lindley distribution (PLD) introduced by Sankaran.12 Ghitany and Mutairi11 have discussed its various mathematical and statistical properties, estimation of the parameter using maximum likelihood estimation and the method of moments, and goodness of fit. Shanker et al.,13 has detailed study on the applications of size-biased Poisson-Lindley distribution (SBPLD) for modeling data on thunderstorms and observed that in most data sets, SBPLD gives better fit than size-biased Poisson distribution (SBPD).
Using (2.4), the r th factorial moment about origin of the SBPGD (2.1) can be obtained as
μ(r)′=E[E(X(r)|λ)]=∞∫0[∞∑x=1x(r)e−λλx−1(x−1)!] θ2θ+3λ(1+θ+θλ)e−θ λdλ
=θ2θ+3∞∫0[λr−1∞∑x= rxe−λλx−r(x−r)!] λ(1+θ+θλ)e−θ λdλ
Taking y=x−r , we get
μ(r)′=θ2θ+3∞∫0[λr−1∞∑y=0(y+r)e−λλyy!] λ(1+θ+θλ)e−θ λdλ
=θ2θ+3∞∫0λr−1(λ+r) λ(1+θ+θλ)e−θ λdλ
=r!{(θ+1)(r θ+r+1)+(r+1)(r θ+r+2)}θr(θ+3);r=1,2,3,... (3.1)
Substituting r=1,2,3, and 4 , the first four factorial moments about origin can be obtained and using the relationship between factorial moments about origin and moments about origin, the first four moments about origin of SBPGD can be obtained as
μ1′=θ2+5θ+8θ(θ+3)
μ2′=θ3+9θ2+30θ+30θ2(θ+3)
μ3′=θ4+17θ3+92θ2+204θ+144θ3(θ+3)
μ4′=θ5+33θ4+270θ3+990θ2+1560θ+840θ4(θ+3)
Using the relationship between moments about mean and the moments about origin, the moments about mean of the SBPGD are thus obtained as
μ2=2(θ3+8θ2+20θ+13)θ2(θ+3)2 μ3=2(θ5+13θ4+68θ3+171θ2+195θ+80)θ3(θ+3)3
μ4=2(θ7+26θ6+269θ5+1435θ4+4230θ3+6819θ2+5520θ+1740)θ4(θ+3)4
The coefficient of variation (C.V) , coefficient of Skewness (√β1) , coefficient of Kurtosis (β2) and the index of dispersion (γ) of the SBPGD are thus obtained as
C.V=σμ′1=√2(θ3+8θ2+20θ+13)θ2+5θ+8
√β1=μ3μ23/2=θ5+13θ4+68θ3+171θ2+195θ+80√2(θ3+8θ2+20θ+13)3/2
β2=μ4μ22=(θ7+26θ6+269θ5+1435θ4+4230θ3+6819θ2+5520θ+1740)2(θ3+8θ2+20θ+13)2
γ=σ2μ1′=2(θ3+8θ2+20θ+13)θ(θ+3)(θ2+5θ+8)
Graphs of coefficient of variation, coefficient of skewness, coefficient of kurtosis and index of dispersion of SBPGD for varying values of parameter θ are shown in figure 2. It is obvious from the graphs that C.V and the index of dispersion are monotonically decreasing while the coefficient of skewness and coefficient of kurtosis are decreasing for increasing value of the parameter θ .
Figure 2 Graphs of coefficient of variation, coefficient of skewness, coefficient of kurtosis and index of dispersion of SBPGD for varying values of parameter θ .
The condition under which SBPGD and SBPLD are over-dispersed, equi-dispersed or under-dispersed are presented in table 1.
Distributions |
Over-dispersion |
Equi-dispersion |
Under-dispersion |
SBPGD |
θ<1.671162 |
θ=1.671162 |
θ>1.671162 |
SBPLD |
θ<1.636061 |
θ<1.636061 |
θ<1.636061 |
Table 1 Over-dispersion, equi-dispersion and under-dispersion of SBPGD and SBPLD
Unimodality and increasing failure rate
Since
P1(x+1;θ)P1(x;θ)=(1θ+1)[1+2x θ+(θ2+4θ+1)x2θ+x(θ2+3θ+1)]
is a deceasing function of x , P1(x;θ) is log-concave. Therefore, SBPGD is unimodal, has an increasing failure rate (IFR), and hence increasing failure rate average (IFRA). It is new better than used in expectation (NBUE) and has decreasing mean residual life (DMRL). Detailed discussion about definitions and interrelationships between these aging concepts are available in Barlow and Proschan.14
Generating functions
Probability Generating Function: The probability generating function of the SBPGD (2.1) can be obtained as
PX(t)=E(tX)=θ2(θ+3)(θ+1)2[θ∞∑x=1x2(tθ+1)x+(θ2+3θ+1)∞∑x=1x(tθ+1)x]
=θ2(θ+3)(θ+1)2[θt(θ+1+t)(θ+1)(θ+1−t)3+t(θ2+3θ+1)(θ+1)(θ+1−t)2]
=θ2t(θ+3)(θ+1)[θ(θ+1+t)(θ+1−t)3+θ2+3θ+1(θ+1−t)2]
Moment generating function: The moment generating function of the SBPGD (2.1) can be given by
MX(t)=E(etX)=θ2et(θ+3)(θ+1)[θ(θ+1+et)(θ+1−et)3+θ2+3θ+1(θ+1−et)2]
Method of moment estimate (MOME): Let x1,x2,...,xn be a random sample of size n from the SBPGD (2.1). Equating the population to the corresponding sample mean, the MOME ˜θ of θ of SBPGD (2.1) can be obtained as
˜θ=−(3ˉx−5)+√9ˉx2+2ˉx−72(ˉx−1)
where ˉx is the sample mean.
Maximum likelihood estimate (MLE): Let x1,x2,...,xn be a random sample of size n from the SBPGD (2.1) and let fx be the observed frequency in the sample corresponding to X=x (x=1,2,3,...,k) such that k∑x=1fx=n , where k is the largest observed value having non-zero frequency. The likelihood function L of the SBPGD (2.1) is given by
L=(θ2θ+3)n1(θ+1)k∑x=1fx(x+2)k∏x=1[x2θ+x(θ2+3θ+1)]fx
The log likelihood function is obtained as
logL=nlog(θ2θ+3)−k∑x=1fx(x+2)log(θ+1)+k∑x=1fxlog[x2θ+x(θ2+3θ+1)]
The first derivative of the log likelihood function is given by
dlogLdθ=n(θ+6)θ(θ+3)−n(ˉx+2)θ+1+k∑x=1(x+2θ+3)fxxθ+(θ2+3θ+1)
where ˉx is the sample mean.
The maximum likelihood estimate (MLE), ˆθ of θ is the solution of the equation dlogLdθ=0 and is given by the solution of the non-linear equation
k∑x=1(x+2θ+3)fxxθ+(θ2+3θ+1)−n(ˉx+2)θ+1+n(θ+6)θ(θ+3)=0
This non-linear equation can be solved by any numerical iteration methods such as Newton- Raphson method, Bisection method, Regula –Falsi method etc. note that in this paper, we have solved above equation using Newton-Raphson method where the initial value of θ is the value given by the method of moment estimate.
In this section, we fit SBPGD using maximum likelihood estimate to test its goodness of fit over SBPD and SBPLD. The first data-set is the immunogold assay data of Cullen et al.,15 regarding the distribution of number of counts of sites with particles from immunogold assay data, the second data-set is the number of European red mites on apple leaves, reported by Garman16 (Tables 2&3).
It is obvious from above tables that SBPGD gives better fit than both SBPD and SBPLD
No. of sites with particles |
Observed frequency |
Expected frequency |
||
SBPD |
SBPLD |
SBPGD |
||
1 |
122 |
111.3 |
119.0 |
119.1 |
Total |
198 |
198.0 |
198.0 |
198.0 |
ML estimate |
|
ˆθ=0.576 |
ˆθ=4.051 |
ˆθ=2.0992 |
χ2 |
|
4.642 |
0.51 |
0.40 |
d.f. |
|
1 |
2 |
2 |
p-value |
|
0.031 |
0.7749 |
0.8187 |
Table 2 Distribution of number of counts of sites with particles from immunogold data
Number of European red mites |
Observed frequency |
Expected frequency |
||
SBPD |
SBPLD |
SBPGD |
||
1 |
38 |
28.7 |
31.7 |
31.9 |
2 |
17 |
25.7 |
23.9 |
23.8 |
3 |
10 |
15.3 |
13.2 |
13.1 |
4 |
9 |
6.92.50.70.20.1} |
6.32.81.20.50.4} |
6.32.81.20.50.4} |
Total |
80 |
80.0 |
80.0 |
80.0 |
ML estimate |
|
ˆθ=1.791615 |
ˆθ=2.163462 |
ˆθ=2.08381 |
χ2 |
|
9.827 |
5.30 |
5.11 |
d.f. |
|
2 |
2 |
2 |
P-value |
|
0.0073 |
0.0706 |
0.0777 |
Table 3 Number of European red mites on apple leaves, reported by Garman (1923)
©2017 Shanker, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7