Size-biased discrete-Lindley distribution and its applications to model distribution of freely-forming small group size

doi:10.15406/bbij.2018.07.00200

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 7 Issue 2

Size-biased discrete-Lindley distribution and its applications to model distribution of freely-forming small group size

Simon Sium, Rama Shanker

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Department of Statistics, College of Science, Eritrea Institute of Technology, Asmara, Eritrea

Correspondence: Rama Shanker, Department of Statistics, College of Science, Eritrea Institute of Technology, Asmara, Eritrea

Received: March 03, 2018 | Published: April 3, 2018

Citation: Sium S, Shanker R. Size-biased discrete-lindley distribution and its applications to model distribution of freely-forming small group size. Biom Biostat Int J. 2018;7(2):131–136. DOI: 10.15406/bbij.2018.07.00200

Download PDF

Abstract

A size-biased discrete-Lindley distribution (SBDLD) has been proposed by size-biasing the discrete Lindley distribution (DLD).The moments about origin and moments about mean have been obtained and hence expressions for coefficient of variation (C.V.), skewness, kurtosis and index of dispersion have been given. The estimate of the parameter of SBDLD by both the method of moment and the method of maximum likelihood are the same. Applications of SBDLD have been discussed with four examples of observed real datasets relating to freely-forming small group size at public places. The goodness of fit of SBDLD shows quite satisfactory fit over size biased Poisson and size-biased Poisson-Lindley Distributions.

Keywords: Size-biased distribution, Discrete-Lindley distribution, Moments and moments based measures, Estimation of parameter, Goodness of fit

Introduction

Let a random variable $X$ has probability distribution $P_{0} (x; θ); x = 0, 1, 2, ..., θ > 0$ . If sample units are weighted or selected with probability proportional to $x^{α}$ , then the corresponding size-biased distribution of order $α$ is given by its probability mass function (pmf)

$P_{1} (x; θ) = \frac{x^{α} \cdot P_{0} (x; θ)}{{μ^{'}}_{α}}$ (1.1)

Where ${μ^{'}}_{α} = E (X^{α}) = \sum_{x = 0}^{\infty} x^{α} P_{0} (x; θ)$ . When $α = 1$ , the distribution is known as simple size-biased distribution and is applicable for size-biased sampling and for $α = 2$ , the distribution is known as area-biased distribution and is applicable for area-biased sampling. In many statistical sampling situations care must be taken so that one does not inadvertently sample from size-biased distribution in place of the one intended of Berhane & Shanker.¹ Size-biased distributions are a particular case of weighted distributions which arise naturally in practice when observations from a sample are recorded with probability proportional to some measure of unit size. In field applications, size-biased distributions can arise either because individuals are sampled with unequal probability by design or because of unequal detection probability. Size-biased distributions come into play when organisms occur in groups, and group size influences the probability of detection. Fisher² firstly introduced these distributions to model ascertainment biases which were later formalized by Rao³ in a unifying theory for problems where the observations fall in non-experimental, non-replicated and non-random categories. Size-biased distributions have applications in environmental science, econometrics, social science, biomedical science, human demography, ecology, geology, forestry etc. Further, size-biasing occurs in many unexpected context such as statistical estimation, renewal theory, infinite divisibility of distributions and number theory. Many researchers have done work on size-biased distributions including Patil & ord,⁴ Patil & Rao,^5,6 Patil,⁷ are some among others.

Lindley⁸ introduced one parameter Lindley distribution having probability density function (pdf) and cumulative distribution function (cdf).

$f (x; θ) = \frac{θ^{2}}{θ + 1} (1 + x) e^{- θ x}; x > 0, θ > 0$ (1.2)

$F (x; θ) = 1 - [1 + \frac{θ x}{θ + 1}] e^{- θ x}, x > 0, θ > 0$ (1.3)

Ghitany et al.⁹ have detailed study on various statistical and mathematical properties, estimation of parameter and application of Lindley distribution and it has benn showed that Lindley distribution gives better fit over exponential distribution to model waiting time data in a bank. Shanker et al.¹⁰ have detailed comparative study on modeling of lifetimes data from engineering and medical science using both Lindley and exponential distributions and showed that both are competing and in majority of datasets exponential distribution gives better fit over Lindley distribution.

Recently Berhane & Shanker¹ introduced discrete-Lindley distribution (DLD), a discrete version of Lindley distribution using infinite series approach, having pmf.

$P_{0} (x; θ) = \frac{{(e^{θ} - 1)}^{2}}{e^{2 θ}} (1 + x) e^{- θ x}; x = 0, 1, 2, 3, ...., θ > 0$ (1.4)

Various statistical properties of DLD, estimation of parameter and applications to model count data have been studied by Berhane & Shanker¹ and it has been observed that it gives better fit than both Poisson distribution and Poisson-Lindley distribution, a Poisson mixture of Lindley⁸ distribution and introduced by Sankaran.¹¹The first four moments about origin and the variance of DLD obtained by Berhane & Shanker¹ are given by

$μ_{1}^{'} = \frac{2}{(e^{θ} - 1)}$ ,

$μ_{2}^{'} = \frac{2 (e^{θ} + 2)}{{(e^{θ} - 1)}^{2}}$ ,

$μ_{3}^{'} = \frac{2 (e^{2 θ} + 7 e^{θ} + 4)}{{(e^{θ} - 1)}^{3}}$ ,

$μ_{4}^{'} = \frac{2 (e^{3 θ} + 18 e^{2 θ} + 33 e^{θ} + 8)}{{(e^{θ} - 1)}^{4}}$ ,

$μ_{2} = σ^{2} = \frac{2 e^{θ}}{{(e^{θ} - 1)}^{2}}$ ,

$μ_{3} = \frac{2 e^{θ} (e^{θ} + 1)}{{(e^{θ} - 1)}^{3}}$ ,

$μ_{4} = \frac{2 e^{θ} (e^{2 θ} + 10 e^{θ} + 1)}{{(e^{θ} - 1)}^{4}}$

In this paper size biased discrete Lindley distribution has been proposed and its moments about origin and moments about mean have been obtained. Behaviors of coefficient of variation, Skewness, kurtosis, and index of dispersion have been discussed graphically for varying values of parameter. The method of moment and the method of maximum likelihood give the same estimate of the parameter. Finally applications of SBDLD have been discussed with four examples of observed real datasets relating to distribution of freely-forming small group size at various public places and the fit by SBDLD has been observed to be quite satisfactory.

Size-biased discrete-Lindley distribution

Using (1.1) and (1.4) and the expression for the mean of DLD, a size-biased discrete-Lindley distribution (SBDLD) with parameter $θ > 0$ can be defined by its pmf.

$P_{2} (x; θ) = \frac{x \cdot P_{0} (x; θ)}{{μ^{'}}_{1}} = \frac{{(e^{θ} - 1)}^{3}}{2 e^{2 θ}} (x + x^{2}) e^{- θ x}; x = 1, 2, 3, ..,$ (2.1)

It can be easily verified that SBDLD is unimodal and have increasing failure rate. Since

$\frac{P_{2} (x + 1; θ)}{P_{2} (x; θ)} = (\frac{1}{e^{θ}}) (1 + \frac{2}{x})$

Is a deceasing function of $x$ , $P_{1} (x; θ)$ is log-concave. Therefore, SBDLD is unimodal, has an increasing failure rate (IFR), and hence increasing failure rate average (IFRA). It is new better than used in expectation (NBUE) and has decreasing mean residual life (DMRL). The definitions, concepts and interrelationship between these aging concepts have been discussed in Barlow & Proschan.¹²

Behavior of the pmf of SBDLD (2.1) for varying values of the parameter has been drawn in Figure 1. It would be recalled that the pmf of size-biased Poisson-Lindley distribution (SBPLD) having parameter $θ > 0$ given by

$P_{3} (x; θ) = \frac{θ^{3}}{θ + 2} \frac{x (x + θ + 2)}{{(θ + 1)}^{x + 2}}; x = 1, 2, 3, ...,; θ > 0$ (2.5)

Has been introduced by Ghitany & Mutairi¹³ which is a size-biased version of Poisson-Lindley distribution (PLD) introduced by Sankaran,¹¹ Ghitany & Mutairi¹³ have discussed its various mathematical and statistical properties, estimation of the parameter using maximum likelihood estimation and the method of moments, and goodness of fit Shanker et al.¹⁴ has critical study on the applications of SBPLD for modeling data on thunderstorms and found that SBPLD is a better model for thunderstorms than size-biased Poisson distribution (SBPD).

Figure 1 Behavior of pmf of SBDLD for varying values of the parameter

θ

Moments

The probability generating function (G(t)) and the moment generating function (M(t)) of SBDLD can be obtained as

$G (t) = \frac{t {(e^{θ} - 1)}^{3}}{{(e^{θ} - t)}^{3}} for t \neq e^{θ}$ , (3.1)

and

$M (t) = \frac{{(e^{θ} - 1)}^{3} e^{2 (θ - t)}}{{(e^{θ - t} - 1)}^{3}} for t \neq θ$ . (3.2)

It can be easily verified that the function in (3.2) is infinitely differentiable with respect to , since it involves exponential terms of its argument. This means that all moments about origin of SBDLD can be obtained. The r^th moment about origin $μ_{r}^{'}, r \geq 1$ of SBDLD (2.1) can be obtained as

$\begin{array}{l} μ_{r}^{'} = E (X^{r}) = \frac{{(e^{θ} - 1)}^{3}}{2 e^{2 θ}} \sum_{x = 1}^{\infty} x^{r} (x + x^{2}) e^{- θ x} \\ = \frac{{(e^{θ} - 1)}^{3}}{2 e^{2 θ}} [\sum_{x = 1}^{\infty} (x^{r + 1} e^{- θ x} + \sum_{x = 1}^{\infty} (x^{r + 2} e^{- θ x}] \end{array}$

Taking $r = 1, 2, 3 and 4$ and simplifying the complicated and tedious algebraic expression, the first four raw moments (moments about the origin) of the SBDLD (2.1) can be obtained as

$μ_{1}^{'} = \frac{e^{θ} + 2}{(e^{θ} - 1)}$

$μ_{2}^{'} = \frac{e^{2 θ} + 7 e^{θ} + 4}{{(e^{θ} - 1)}^{2}}$

$μ_{3}^{'} = \frac{e^{3 θ} + 18 e^{2 θ} + 33 e^{θ} + 8}{{(e^{θ} - 1)}^{3}}$

$μ_{4}^{'} = \frac{e^{4 θ} + 41 e^{3 θ} + 171 e^{2 θ} + 131 e^{θ} + 16}{{(e^{θ} - 1)}^{4}}$

Now, using the relationship between central moments (moments about mean) and the raw moments, the central moments of the SBDLD (2.1) can be obtained as

The coefficient of variation $(C . V)$ , coefficient of Skewness $(\sqrt{β_{1}})$ , coefficient of Kurtosis $(β_{2})$ and index of dispersion $(γ)$ of the SBDLD (2.1) are thus given as

$C . V = \frac{σ}{{μ^{'}}_{1}} = \frac{\sqrt{3 e^{θ}}}{(e^{θ} + 2)}$

$\sqrt{β_{1}} = \frac{μ_{3}}{μ_{2}^{3 / 2}} = \frac{3 e^{θ} (e^{θ} + 1)}{{(3 e^{θ})}^{3 / 2}}$

$β_{2} = \frac{μ_{4}}{μ_{2}^{2}} = \frac{(e^{2 θ} + 13 e^{θ} + 1)}{3 e^{θ}}$

$γ = \frac{σ^{2}}{μ_{1}^{'}} = \frac{3 e^{θ}}{(e^{θ} - 1) (e^{θ} + 2)}$

It can be easily verified that SBDLD is over-dispersed $(μ < σ^{2})$ , equi-dispersed $(μ = σ^{2})$ and under-dispersed $(μ > σ^{2})$ for $θ > (=) < θ^{*} = 1.00505$ . It should be noted that SBPLD is over-dispersed $(μ < σ^{2})$ , equi-dispersed $(μ = σ^{2})$ and under-dispersed $(μ > σ^{2})$ for $θ < (=) > θ^{*} = 1.671162$ . The behavior of mean, variance, C.V, skewness, kurtosis and index of dispersion for varying values of parameter has been shown numerically in Table 1.

Theta	Mean	Variance	CV	Skewness	Kurtosis	Index of dispersion
0.25	47.7508	11.5624	0.5976	1.1637	5.0209	4.1298
0.50	11.7531	5.6245	0.6095	1.1910	5.0851	2.0896
0.75	5.0902	3.6858	0.6121	1.2368	5.1965	1.3810
1.00	2.7620	2.7459	0.6052	1.3021	5.3621	1.0059
1.25	1.6884	2.2047	0.5894	1.3877	5.5923	0.7658
1.50	1.1091	1.8617	0.5657	1.4950	5.9016	0.5958
1.75	0.7637	1.6310	0.5358	1.6257	6.3095	0.4682
2.00	0.5430	1.4696	0.5015	1.7818	6.8415	0.3695
2.25	0.3951	1.3535	0.4644	1.9658	7.5310	0.2919
2.50	0.2923	1.2683	0.4263	2.1806	8.4215	0.2304
2.75	0.2189	1.2049	0.3883	2.4294	9.5689	0.1817
3.00	0.1654	1.1572	0.3515	2.7163	11.0451	0.1430

Table 1 Values of coefficient of variation, skewness, kurtosis, index of dispersion, mean and variance of SBDLD for different values of parameter $θ$

The behavior of coefficient of variation $(C . V)$ , coefficient of Skewness $(\sqrt{β_{1}})$ , coefficient of Kurtosis $(β_{2})$ and index of dispersion $(γ)$ of the SBDLD are shown in Figure 2. From Figure 2, it is obvious that C.V and index of dispersion are monotonically decreasing whereas coefficient of skewness and coefficient of kurtosis are monotonically increasing for increasing values of the parameter $θ$ .

Figure 2 Behavior of C.V, coefficient of Skewness, coefficient of Kurtosis and index of dispersion of the SBDLD for varying values of the parameter

θ

The behavior of mean and variance for varying values of parameter $θ$ has been shown in Figure 3.

Figure 3 Behavior of Mean and Variance of the SBDLD for varying values of the parameter

θ

Estimation of parameter

Method of Moment Estimate (MOME)

Equating the population mean to the corresponding sample mean, the method of moment estimate (MOME) $\tilde{θ}$ of $θ$ of SBDLD (2.1) is given by

$\tilde{θ} = \ln (\frac{\bar{x} + 2}{\bar{x} - 1})$ ,

Where $\bar{x}$ is the sample mean.

Maximum Likelihood Estimate (MLE)

Let $x_{1}, x_{2}, ..., x_{n}$ be a random sample of size $n$ from the SBDLD (2.1) and let $f_{x}$ be the observed frequency in the sample corresponding to $X = x (x = 1, 2, 3, ..., k)$ such that $\sum_{x = 1}^{k} f_{x} = n$ , where is the largest observed value having non-zero frequency. The likelihood function of the SBDLD (2.1) is given by

$L = {(\frac{{(e^{θ} - 1)}^{3}}{2 e^{2 θ}})}^{n} \cdot e^{- θ \sum_{x = 1}^{k} x \cdot f_{x}} \cdot \prod_{x = 1}^{k} {(x + x^{2})}^{f_{x}}$

The log likelihood function can be obtained as

$\ln L = n (3 \ln (e^{θ} - 1) - \ln (2 e^{2 θ})) - θ \sum_{x = 1}^{k} x f_{x} + \sum_{x = 1}^{k} f_{x} \ln (x + x^{2})$

The first derivative of the log likelihood function is thus given by

$\frac{d \ln L}{d θ} = \frac{3 n e^{θ}}{(e^{θ} - 1)} - 2 n - n \bar{x}$ ,

Where $\bar{x}$ is the sample mean. The maximum likelihood estimate (MLE), $\hat{θ}$ of $θ$ of SBDLD (2.1) is the solution of the equation $\frac{d \ln L}{d θ} = 0$ and is given by

$\hat{θ} = \ln (\frac{\bar{x} + 2}{\bar{x} - 1})$

Thus, like DLD, both MOME and MLE give the same estimate of the parameter $θ$ in case of SBDLD.

Goodness of fit

We know that size-biased distributions are useful for modeling data relating to situation when organisms occur in groups and the group size influence the probability of detection. In this section, the goodness of fit of SBDLD has been discussed with data relating to the size distribution of freely -forming small groups at various public places, reported by James¹⁵ and Coleman & James.¹⁶ The expected frequency by size-biased Poisson distribution (SBPD) and size-biased Poisson-Lindley distribution (SBPLD) have also been presented for ready comparison with SBDLD. Note that the goodness of fit of SBDLD, SBPD and SBPLD is based on the maximum likelihood estimates of the parameter.

Based on the values of chi-square ( $χ^{2}$ ) and p-value, it is obvious that SBDLD gives much closer fit than SBPD and SBPLD in the Tables 2-4 while in Table 5, SBPLD gives much closer fit than both SBPD and SBDLD. Thus, SBDLD can be considered an important distribution for modeling the distribution of freely-forming small group size at various public places.

Group Size	Observed Frequency	Expected Frequency
Group Size	Observed Frequency	SBPD	SBPLD	SBDLD
1 2 3 4 5 6	1486 694 195 37 10 1	1452.4 743.3 190.2 32.4 4.1 0.6	1532.5 630.6 191.9 51.3 12.8 3.9	1486.4 693.0 193.9 41.0 7.3 1.4
Total	2423	2423.0	2423.0	2423
ML estimate		$\hat{θ} = 0.5118$	$\hat{θ} = 4.5082$	$\hat{θ} = 2.3725$
$χ^{2}$		7.370	1.760	1.007
d.f.		2	3	3
p-value		0.0251	0.0030	0.9088

Table 2 Pedestrians-eugene, spring, morning

Group Size	Observed Frequency	Expected Frequency
Group Size	Observed Frequency	SBPD	SBPLD	SBDLD
1 2 3 4 5	316 141 44 5 4	306.3 156.2 39.8 6.8 0.9	323.0 132.5 40.2 10.7 3.6	313.4 145.6 40.6 8.6 1.8
Total	510	510.0	510.0	510.0
ML estimate		$\hat{θ} = 0.5098$	$\hat{θ} = 4.5224$	$\hat{θ} = 2.3760$
$χ^{2}$		2.463	3.020	0.640
d.f.		2	2	2
p-value		0.4818	0.3884	0.8872

Table 3 Shopping groups–eugene, spring, department store and public market

Group Size	Observed Frequency	Expected Frequency
Group Size	Observed Frequency	SBPD	SBPLD	SBDLD
1 2 3 4 5 6	305 144 50 5 2 1	296.5 159.0 42.6 7.6 1.0 0.3	314.4 134.4 42.5 11.8 3.1 0.8	304.1 148.0 43.2 9.5 1.8 0.4
Total	507	507.0	507.0	507
ML estimate		$\hat{θ} = 0.5365$	$\hat{θ} = 4.3179$	$\hat{θ} = 2.3294$
$χ^{2}$		3.035	6.415	2.351
d.f.		2	2	2
p-value		0.2190	0.0400	0.5028

Table 4 Play groups–eugene, spring, public playground D

Number times hares caught	Observed Frequency	Expected Frequency
Number times hares caught	Observed Frequency	SBPD	SBPLD	SBDLD
1 2 3 4 5	306 132 47 10 2	292.2 155.2 41.2 7.3 1.1	309.4 131.2 41.1 11.3 4.0	299.5 144.5 41.8 9.1 2.1
Total	497	497.0	497.0	497
ML estimate		$\hat{θ} = 0.5312$	$\hat{θ} = 4.3548$	$\hat{θ} = 2.3385$
$χ^{2}$		6.479	0.932	1.926
d.f.		2	2	2
p-value		0.0390	0.6281	0.5878

Table 5 Play groups–eugene, spring, public playground A

Concluding remarks

In the present paper size-biased discrete Lindley distribution (SBDLD), a simple size-biased version of the discrete Lindley distribution (DLD) of Berhane & Shanker¹ has been proposed and studied. Its raw moments and central moments have been obtained and hence expressions for coefficient of variation, skewness, kurtosis and index of dispersion have been presented and their behaviors have been discussed graphically. The estimation of its parameter has been discussed using the method of moments and the method of maximum likelihood. The goodness of fit of the SBDLD has been discussed with four examples of observed real datasets relating to freely-forming small group size at public places over SBPD and SBPLD and the fit given by SBDLD gives quite satisfactory fit. Therefore, SBDLD can be considered an important distribution for modeling count data relating to freely-forming small group size at public places.