Size-biased poisson-garima distribution with applications

doi:10.15406/bbij.2017.06.00167

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 6 Issue 3

Size-biased poisson-garima distribution with applications

Rama Shanker,

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Kamlesh Kumar Shukla

Department of Statistics, Eritrea Institute of Technology, Eritrea

Correspondence: Rama Shanker, Department of Statistics, Eritrea Institute of Technology, Asmara, Eritrea

Received: August 23, 2017 | Published: August 28, 2017

Citation: Shanker R, Shukla KK. Size-biased poisson-garima distribution with applications. Biom Biostat Int J. 2017;6(3):335-340. DOI: 10.15406/bbij.2017.06.00167

Download PDF

Abstract

In this paper, a size-biased Poisson-Garima distribution (SBPGD) has been obtained by size-biasing the Poisson-Garima distribution (PGD) introduced by Shanker (2017). The moments about origin and moments about mean have been obtained and hence expressions for coefficient of variation (C.V.), skewness and Kurtosis have been obtained. The estimation of its parameter using the method of moment and the method of maximum likelihood estimation has been discussed. The goodness of fit of SBPGD has been discussed for two real data sets using maximum likelihood estimate and the fit shows quite satisfactory over size-biased Poisson distribution (SBPD) and size-biased Poisson-Lindley distribution (SBPLD).

Keywords: garima distribution, poisson-garima distribution, size-biasing, moments, estimation of parameter, goodness of fit

Introduction

Shanker¹ has obtained Poisson-Garima distribution (PGD) for modeling count data having probability mass function (p.m.f.)

$P_{0} (x; θ) = \frac{θ}{θ + 2} \frac{θ x + (θ^{2} + 3 θ + 1)}{{(θ + 1)}^{x + 2}}; x = 0, 1, 2, ..., θ > 0$ (1.1)

The first four moments about origin and the variance of PGD obtained by Shanker¹ are as follows:

${μ^{'}}_{1} = \frac{θ + 3}{θ (θ + 2)}$ , ${μ^{'}}_{2} = \frac{θ^{2} + 5 θ + 8}{θ^{2} (θ + 2)}$ , ${μ^{'}}_{3} = \frac{θ^{3} + 9 θ^{2} + 30 θ + 30}{θ^{3} (θ + 2)}$

${μ^{'}}_{4} = \frac{θ^{4} + 17 θ^{3} + 92 θ^{2} + 204 θ + 144}{θ^{4} (θ + 2)}$ , and $μ_{2} = \frac{θ^{3} + 6 θ^{2} + 12 θ + 7}{θ^{2} {(θ + 2)}^{2}}$

The detailed discussion about its properties, estimation of parameter, and applications has been discussed by Shanker¹ and it has been shown that it is better than Poisson and Poisson-Lindley distributions for modeling count data in various fields of knowledge. The PGD arises from the Poisson distribution when its parameter $λ$ follows Garima distribution introduced by Shanker² having probability density function (p.d.f.)

$f_{0} (λ; θ) = \frac{θ}{θ + 2} (1 + θ + θ λ) e^{- θ λ}; λ > 0, θ > 0$ (1.2)

Size-biased distributions arise in practice when observations from a sample having probability proportional to some measure of unit size. Fisher³ firstly introduced these distributions to model ascertainment biases which were later formalized by Rao⁴ in a unifying theory. Size-biased observations occur in many research areas and its fields of applications includes medical science, sociology, psychology, ecology, geological sciences etc. The applications of size-biased distribution theory in fitting distributions of diameter at breast height (DBH) data arising from horizontal point sampling (HPS) has been discussed by Van Deusen.⁵ Further, Lappi and Bailey⁶ have applied size-biased distributions to analyze HPS diameter increment data. The statistical applications of size-biased distributions to the analysis of data relating to human population and ecology can be found in Patil and Rao.^7,8 Some of the recent results on size-biased distributions pertaining to parameter estimation in forestry with special emphasis on Weibull family have been discussed by Gove.⁹ Ducey and Gove¹⁰ discussed size-biased distributions in the generalized beta distribution family, with applications to forestry.

Let a random variable $X$ has the original probability distribution $P_{0} (x; θ)$ , then a simple size-biased distribution is given by its probability function

$P_{1} (x; θ) = \frac{x \cdot P_{0} (x; θ)}{{μ^{'}}_{1}}$ (1.3)

Where ${μ^{'}}_{1} = E (X)$ is the mean of the original probability distribution.

In the present paper, a size-biased Poisson-Garima distribution (SBPGD) has been proposed. It s raw and central moments and central moments based properties including coefficient of variation, skewness, kurtosis and index of dispersion have been obtained and discussed. Some of its statistical properties have been discussed. The method of moment and the method of maximum likelihood estimation have been discussed for estimating the parameter of SBPGD. The goodness of fit of SBPGD has also been presented.

Size-biased poisson-garima distribution

Using (1.1) and (1.3), the p.m.f. of the size-biased Poisson-Garima distribution (SBPGD) with parameter $θ$ can be obtained as

$P_{1} (x; θ) = \frac{x \cdot P_{0} (x; θ)}{{μ^{'}}_{1}} = \frac{θ^{2}}{θ + 3} \frac{x^{2} θ + x (θ^{2} + 3 θ + 1)}{{(θ + 1)}^{x + 2}}; x = 1, 2, 3, .., θ > 0$ (2.1)

where ${μ^{'}}_{1} = \frac{θ + 3}{θ (θ + 2)}$ is the mean of the PGD (1.1).

The SBPGD can also be obtained from the size-biased Poisson distribution (SPBD) with p.m.f.

$g (x | λ) = \frac{e^{- λ} λ^{x - 1}}{(x - 1)!}; x = 1, 2, 3, ..., λ > 0$ (2.2)

when its parameter $λ$ follows size-biased Garima distribution (SBGD) with p.d.f.

$h (λ; θ) = \frac{θ^{2}}{θ + 3} λ (1 + θ + θ λ) e^{- θ λ}; x > 0, θ > 0$ (2.3)

Thus the p.m.f of SBPGD can be obtained as

$P (X = x) = \int_{0}^{\infty} g (x | λ) \cdot h (λ; θ) d λ$

$= \int_{0}^{\infty} \frac{e^{- λ} λ^{x - 1}}{(x - 1)!} \frac{θ^{2}}{θ + 3} λ (1 + θ + θ λ) e^{- θ λ} d λ$ (2.4)

$= \frac{θ^{2}}{(θ + 3) (x - 1)!} \int_{0}^{\infty} e^{- (θ + 1) λ} [(1 + θ) λ^{x} + θ λ^{x + 1}] d λ$

$= \frac{θ^{2}}{θ + 3} \frac{x^{2} θ + x (θ^{2} + 3 θ + 1)}{{(θ + 1)}^{x + 2}}; x = 1, 2, 3, .., θ > 0$

which is the p.m.f of SBPGD with parameter $θ$ .

Graphs of SBPGD for varying values of parameter $θ$ are shown in figure 1. It is obvious from the graphs of SBPGD that as the value of parameter $θ$ increases, the initially the graphs shift upward and decreases fast for increasing values of $x$ . Also the graphs become convex for values of $θ \geq 2$ .

Figure 1 Graphs of SBPGD for varying values of parameter $θ$ .

It would be recalled that the p.m.f of size-biased Poisson-Lindley distribution (SBPLD) given by

$P_{2} (X = x) = \frac{θ^{3}}{θ + 2} \frac{x (x + θ + 2)}{{(θ + 1)}^{x + 2}}; x = 1, 2, 3, ...,; θ > 0$ (1.7)

has been introduced by Ghitany and Mutairi,¹¹ which is a size-biased version of Poisson-Lindley distribution (PLD) introduced by Sankaran.¹² Ghitany and Mutairi¹¹ have discussed its various mathematical and statistical properties, estimation of the parameter using maximum likelihood estimation and the method of moments, and goodness of fit. Shanker et al.,¹³ has detailed study on the applications of size-biased Poisson-Lindley distribution (SBPLD) for modeling data on thunderstorms and observed that in most data sets, SBPLD gives better fit than size-biased Poisson distribution (SBPD).

Moments and moments based measures

Using (2.4), the $r$ ^th factorial moment about origin of the SBPGD (2.1) can be obtained as

$μ_{(r)}^{'} = E [E (X^{(r)} | λ)] = \int_{0}^{\infty} [\sum_{x = 1}^{\infty} x^{(r)} \frac{e^{- λ} λ^{x - 1}}{(x - 1)!}] \frac{θ^{2}}{θ + 3} λ (1 + θ + θ λ) e^{- θ λ} d λ$

$= \frac{θ^{2}}{θ + 3} \int_{0}^{\infty} [λ^{r - 1} \sum_{x = r}^{\infty} x \frac{e^{- λ} λ^{x - r}}{(x - r)!}] λ (1 + θ + θ λ) e^{- θ λ} d λ$

Taking $y = x - r$ , we get

$μ_{(r)}^{'} = \frac{θ^{2}}{θ + 3} \int_{0}^{\infty} [λ^{r - 1} \sum_{y = 0}^{\infty} (y + r) \frac{e^{- λ} λ^{y}}{y!}] λ (1 + θ + θ λ) e^{- θ λ} d λ$

$= \frac{θ^{2}}{θ + 3} \int_{0}^{\infty} λ^{r - 1} (λ + r) λ (1 + θ + θ λ) e^{- θ λ} d λ$

$= \frac{r! {(θ + 1) (r θ + r + 1) + (r + 1) (r θ + r + 2)}}{θ^{r} (θ + 3)}; r = 1, 2, 3, ...$ (3.1)

Substituting $r = 1, 2, 3, and 4$ , the first four factorial moments about origin can be obtained and using the relationship between factorial moments about origin and moments about origin, the first four moments about origin of SBPGD can be obtained as

$μ_{1}^{'} = \frac{θ^{2} + 5 θ + 8}{θ (θ + 3)}$

$μ_{2}^{'} = \frac{θ^{3} + 9 θ^{2} + 30 θ + 30}{θ^{2} (θ + 3)}$

$μ_{3}^{'} = \frac{θ^{4} + 17 θ^{3} + 92 θ^{2} + 204 θ + 144}{θ^{3} (θ + 3)}$

$μ_{4}^{'} = \frac{θ^{5} + 33 θ^{4} + 270 θ^{3} + 990 θ^{2} + 1560 θ + 840}{θ^{4} (θ + 3)}$

Using the relationship between moments about mean and the moments about origin, the moments about mean of the SBPGD are thus obtained as

$μ_{2} = \frac{2 (θ^{3} + 8 θ^{2} + 20 θ + 13)}{θ^{2} {(θ + 3)}^{2}}$ $μ_{3} = \frac{2 (θ^{5} + 13 θ^{4} + 68 θ^{3} + 171 θ^{2} + 195 θ + 80)}{θ^{3} {(θ + 3)}^{3}}$

$μ_{4} = \frac{2 (θ^{7} + 26 θ^{6} + 269 θ^{5} + 1435 θ^{4} + 4230 θ^{3} + 6819 θ^{2} + 5520 θ + 1740)}{θ^{4} {(θ + 3)}^{4}}$

The coefficient of variation $(C . V)$ , coefficient of Skewness $(\sqrt{β_{1}})$ , coefficient of Kurtosis $(β_{2})$ and the index of dispersion $(γ)$ of the SBPGD are thus obtained as

$C . V = \frac{σ}{{μ^{'}}_{1}} = \frac{\sqrt{2 (θ^{3} + 8 θ^{2} + 20 θ + 13)}}{θ^{2} + 5 θ + 8}$

$\sqrt{β_{1}} = \frac{μ_{3}}{μ_{2}^{3 / 2}} = \frac{θ^{5} + 13 θ^{4} + 68 θ^{3} + 171 θ^{2} + 195 θ + 80}{\sqrt{2} {(θ^{3} + 8 θ^{2} + 20 θ + 13)}^{3 / 2}}$

$β_{2} = \frac{μ_{4}}{μ_{2}^{2}} = \frac{(θ^{7} + 26 θ^{6} + 269 θ^{5} + 1435 θ^{4} + 4230 θ^{3} + 6819 θ^{2} + 5520 θ + 1740)}{2 {(θ^{3} + 8 θ^{2} + 20 θ + 13)}^{2}}$

$γ = \frac{σ^{2}}{μ_{1}^{'}} = \frac{2 (θ^{3} + 8 θ^{2} + 20 θ + 13)}{θ (θ + 3) (θ^{2} + 5 θ + 8)}$

Graphs of coefficient of variation, coefficient of skewness, coefficient of kurtosis and index of dispersion of SBPGD for varying values of parameter $θ$ are shown in figure 2. It is obvious from the graphs that C.V and the index of dispersion are monotonically decreasing while the coefficient of skewness and coefficient of kurtosis are decreasing for increasing value of the parameter $θ$ .

Figure 2 Graphs of coefficient of variation, coefficient of skewness, coefficient of kurtosis and index of dispersion of SBPGD for varying values of parameter $θ$ .

The condition under which SBPGD and SBPLD are over-dispersed, equi-dispersed or under-dispersed are presented in table 1.

Distributions	Over-dispersion $(μ < σ^{2})$	Equi-dispersion $(μ = σ^{2})$	Under-dispersion $(μ > σ^{2})$
SBPGD	$θ < 1.671162$	$θ = 1.671162$	$θ > 1.671162$
SBPLD	$θ < 1.636061$	$θ < 1.636061$	$θ < 1.636061$

Table 1 Over-dispersion, equi-dispersion and under-dispersion of SBPGD and SBPLD

Statistical properties of SBPGD

Unimodality and increasing failure rate

Since

$\frac{P_{1} (x + 1; θ)}{P_{1} (x; θ)} = (\frac{1}{θ + 1}) [1 + \frac{2 x θ + (θ^{2} + 4 θ + 1)}{x^{2} θ + x (θ^{2} + 3 θ + 1)}]$

is a deceasing function of $x$ , $P_{1} (x; θ)$ is log-concave. Therefore, SBPGD is unimodal, has an increasing failure rate (IFR), and hence increasing failure rate average (IFRA). It is new better than used in expectation (NBUE) and has decreasing mean residual life (DMRL). Detailed discussion about definitions and interrelationships between these aging concepts are available in Barlow and Proschan.¹⁴

Generating functions

Probability Generating Function: The probability generating function of the SBPGD (2.1) can be obtained as

$P_{X} (t) = E (t^{X}) = \frac{θ^{2}}{(θ + 3) {(θ + 1)}^{2}} [θ \sum_{x = 1}^{\infty} x^{2} {(\frac{t}{θ + 1})}^{x} + (θ^{2} + 3 θ + 1) \sum_{x = 1}^{\infty} x {(\frac{t}{θ + 1})}^{x}]$

$= \frac{θ^{2}}{(θ + 3) {(θ + 1)}^{2}} [\frac{θ t (θ + 1 + t) (θ + 1)}{{(θ + 1 - t)}^{3}} + \frac{t (θ^{2} + 3 θ + 1) (θ + 1)}{{(θ + 1 - t)}^{2}}]$

$= \frac{θ^{2} t}{(θ + 3) (θ + 1)} [\frac{θ (θ + 1 + t)}{{(θ + 1 - t)}^{3}} + \frac{θ^{2} + 3 θ + 1}{{(θ + 1 - t)}^{2}}]$

Moment generating function: The moment generating function of the SBPGD (2.1) can be given by

$M_{X} (t) = E (e^{t X}) = \frac{θ^{2} e^{t}}{(θ + 3) (θ + 1)} [\frac{θ (θ + 1 + e^{t})}{{(θ + 1 - e^{t})}^{3}} + \frac{θ^{2} + 3 θ + 1}{{(θ + 1 - e^{t})}^{2}}]$

Estimation of parameter

Method of moment estimate (MOME): Let $x_{1}, x_{2}, ..., x_{n}$ be a random sample of size $n$ from the SBPGD (2.1). Equating the population to the corresponding sample mean, the MOME $\tilde{θ}$ of $θ$ of SBPGD (2.1) can be obtained as

$\tilde{θ} = \frac{- (3 \bar{x} - 5) + \sqrt{9 {\bar{x}}^{2} + 2 \bar{x} - 7}}{2 (\bar{x} - 1)}$

where $\bar{x}$ is the sample mean.

Maximum likelihood estimate (MLE): Let $x_{1}, x_{2}, ..., x_{n}$ be a random sample of size $n$ from the SBPGD (2.1) and let $f_{x}$ be the observed frequency in the sample corresponding to $X = x (x = 1, 2, 3, ..., k)$ such that $\sum_{x = 1}^{k} f_{x} = n$ , where $k$ is the largest observed value having non-zero frequency. The likelihood function $L$ of the SBPGD (2.1) is given by

$L = {(\frac{θ^{2}}{θ + 3})}^{n} \frac{1}{{(θ + 1)}^{\sum_{x = 1}^{k} f_{x} (x + 2)}} \prod_{x = 1}^{k} {[x^{2} θ + x (θ^{2} + 3 θ + 1)]}^{f_{x}}$

The log likelihood function is obtained as

$\log L = n \log (\frac{θ^{2}}{θ + 3}) - \sum_{x = 1}^{k} f_{x} (x + 2) \log (θ + 1) + \sum_{x = 1}^{k} f_{x} \log [x^{2} θ + x (θ^{2} + 3 θ + 1)]$

The first derivative of the log likelihood function is given by

$\frac{d \log L}{d θ} = \frac{n (θ + 6)}{θ (θ + 3)} - \frac{n (\bar{x} + 2)}{θ + 1} + \sum_{x = 1}^{k} \frac{(x + 2 θ + 3) f_{x}}{x θ + (θ^{2} + 3 θ + 1)}$

where $\bar{x}$ is the sample mean.

The maximum likelihood estimate (MLE), $\hat{θ}$ of $θ$ is the solution of the equation $\frac{d \log L}{d θ} = 0$ and is given by the solution of the non-linear equation

$\sum_{x = 1}^{k} \frac{(x + 2 θ + 3) f_{x}}{x θ + (θ^{2} + 3 θ + 1)} - \frac{n (\bar{x} + 2)}{θ + 1} + \frac{n (θ + 6)}{θ (θ + 3)} = 0$

This non-linear equation can be solved by any numerical iteration methods such as Newton- Raphson method, Bisection method, Regula –Falsi method etc. note that in this paper, we have solved above equation using Newton-Raphson method where the initial value of $θ$ is the value given by the method of moment estimate.

Data analysis

In this section, we fit SBPGD using maximum likelihood estimate to test its goodness of fit over SBPD and SBPLD. The first data-set is the immunogold assay data of Cullen et al.,¹⁵ regarding the distribution of number of counts of sites with particles from immunogold assay data, the second data-set is the number of European red mites on apple leaves, reported by Garman¹⁶ (Tables 2&3).

It is obvious from above tables that SBPGD gives better fit than both SBPD and SBPLD

No. of sites with particles	Observed frequency	Expected frequency
No. of sites with particles	Observed frequency	SBPD	SBPLD	SBPGD
1 2 3 4 5	122 50 18 4 4	111.3 64.1 $\begin{array}{l} 18.5 \\ 3.5 \\ 0.6 \end{array}}$	119.0 53.8 18.0 $\begin{array}{l} 5.3 \\ 1.9 \end{array}}$	119.1 53.7 18.0 $\begin{array}{l} 5.3 \\ 1.9 \end{array}}$
Total	198	198.0	198.0	198.0
ML estimate		$\hat{θ} = 0.576$	$\hat{θ} = 4.051$	$\hat{θ} = 2.0992$
$χ^{2}$		4.642	0.51	0.40
d.f.		1	2	2
p-value		0.031	0.7749	0.8187

Table 2 Distribution of number of counts of sites with particles from immunogold data

Number of European red mites	Observed frequency	Expected frequency
Number of European red mites	Observed frequency	SBPD	SBPLD	SBPGD
1	38	28.7	31.7	31.9
2	17	25.7	23.9	23.8
3	10	15.3	13.2	13.1
4 5 6 7 8	9 3 2 1 0	$\begin{array}{l} 6.9 \\ 2.5 \\ 0.7 \\ 0.2 \\ 0.1 \end{array}}$	$\begin{array}{l} 6.3 \\ 2.8 \\ 1.2 \\ 0.5 \\ 0.4 \end{array}}$	$\begin{array}{l} 6.3 \\ 2.8 \\ 1.2 \\ 0.5 \\ 0.4 \end{array}}$
Total	80	80.0	80.0	80.0
ML estimate		$\hat{θ} = 1.791615$	$\hat{θ} = 2.163462$	$\hat{θ} = 2.08381$
$χ^{2}$		9.827	5.30	5.11
d.f.		2	2	2
P-value		0.0073	0.0706	0.0777