Research Article Volume 3 Issue 4
On poisson-sujatha distribution and its applications to model count data from biological sciences
Rama Shanker,
Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.
Hagos Fesshaye
Department of Statistics, Eritrea Institute of Technology, Eritrea
Correspondence: Rama Shanker, Department of Statistics, Eritrea Institute of Technology, Asmara, Eritrea
Received: January 29, 2016 | Published: March 10, 2016
Citation: Shanker R, Fesshaye H. On poisson-sujatha distribution and its applications to model count data from biological sciences. Biom Biostat Int J. 2016;3(4):100-106. DOI: 10.15406/bbij.2016.03.00069
Download PDF
Abstract
In this paper a simple method for finding moments of Poisson-Sujatha distribution (PSD) introduced by Shanker1 has been suggested and hence the first four moments about origin and the variance has been given. The PSD has been fitted to the same data-sets relating to ecology and genetics to which earlier Shanker & Hagos2 has fitted Poisson-Lindley distribution (PLD) introduced by Sankaran3 and Poisson-distribution (PD) and the goodness of fit of PSD shows satisfactory fit in majority of data-sets.
Keywords: sujatha distribution, poisson-sujatha distribution, lindley distribution, poisson-lindley distribution, moments, compounding, estimation of parameter, goodness of fit
Introduction
The Poisson-Sujatha distribution (PSD) having probability mass function
(1.1)
has been introduced by Shanker1 for modeling count data-sets. The PSD arises from Poisson distribution when its parameter follows Sujatha distribution introduced by Shanker4 having probability density function
(1.2)
We have
(1.3)
(1.4)
Which is the Poisson-Sujatha distribution (PSD).
Shanker4 has shown that the Sujatha distribution (1.2) is a three component mixture of an exponential (θ) distribution, a gamma (2,θ) distribution, and a gamma (3,θ) distributionwith their mixing proportions
,
and
respectively. Shanker4 has discussed its various mathematical and statistical properties including its shape, moment generating function, moments, skewness, kurtosis, hazard rate function, mean residual life function, stochastic orderings, mean deviations, distribution of order statistics, Bonferroni and Lorenz curves, Renyi entropy measure, stress-strength reliability , amongst others along with the estimation of the parameter and applications for modeling lifetime data.
Shanker1 has detailed study about various mathematical and statistical properties of PSD including moment generating function, coefficient of variation, skewness, kurtosis, over-dispersion, hazard rate and unimodality along with the estimation of the parameter and applications. Shanker & Hagos5,6 have obtained size-biased Poisson-Sujatha distribution (SBPSD) and zero-truncated Poisson-Sujatha distribution(ZTPSD) and discussed their statistical properties, estimation of the parameter and applications. Further, Shanker & Hagos7 have detailed study about zero-truncation of Poisson, Poisson-Lindley and Poisson-Sujatha distributions and their applications.
The probability mass function of Poisson-Lindley distribution (PLD) given by
x = 0, 1, 2,…,θ > 0. (1.5)
has been introduced by Sankaran
3 to model count data. The distribution arises from the Poisson distribution when its parameter follows Lindley
8 distribution with its probability density function
;
(1.6)
In this paper a simple method for finding moments of Poisson-Sujatha distribution (PSD) introduced by Shanker1 has been suggested and hence the first four moments about origin and the variance has been presented. It seems that not much work has been done on the applications of PSD so far. The PSD has been fitted to the same data-sets relating to ecology and genetics to which Shanker & Hagos2 has fitted Poisson-Lindley distribution (PLD) introduced by Sankaran3 and Poisson-distribution (PD) and the goodness of fit of PSD shows satisfactory fit in majority of data-sets.
Moments of poisson-sujatha distribution
Using (1.3) the
th moment about origin of PSD (1.1) can be obtained as
(2.1)
Clearly the expression under the bracket in (2.1) is the
th moment about origin of the Poisson distribution. Taking
in (2.1) and using the first moment about origin of the Poisson distribution, the first moment about origin of the PSD (1.1) can be obtained as
(2.2)
Again taking
in (2.1) and using the second moment about origin of the Poisson distribution, the second moment about origin of the PSD (1.1) is obtained as
(2.3)
Similarly, taking
in (2.1) and using the third and the fourth moment about origin of the Poisson distribution, the third and the fourth moment about origin of the PSD (1.1) are obtained as
(2.4)
(2.5)
Thus the variance of the PSD (1.1) can be obtained as
(2.6)
Shanker1 has shown that the PSD is always over-dispersed, has increasing hazard rate and unimodal. Further, Shanker1 has also shown that the graphs of coefficient of variation, skewness, and kurtosis of PSD are increasing for increasing values of the parameter.
Estimation of the parameter
Maximum likelihood estimate (MLE) of the parameter: Let
be a random sample of size
from the PSD (1.1) and let
be the observed frequency in the sample corresponding to
such that
, where
is the largest observed value having non-zero frequency. The likelihood function
of the PSD (1.1) is given by
The log likelihood function is thus obtained as
The first derivative of the log likelihood function is given by
Where
is the sample mean.
The maximum likelihood estimate (MLE),
of
of PSD (1.1) is the solution of the equation
and is given by the solution of the following non-linear equation
This non-linear equation can be solved by any numerical iteration methods such as Newton- Raphson, Bisection method, Regula–Falsi method etc.
Method of moment estimate (MOME) of the parameter: Let
be a random sample of size
from the PSD (1.1). Equating the population mean to the corresponding sample mean, the MOME
of
of PSD (1.1) is the solution of the following cubic equation
Where
is the sample mean.
Applications of poisson-sujatha distribution
The Poisson distribution is a suitable statistical model for the situations where events seem to occur at random including the number of customers arriving at a service point, the number of telephone calls arriving at an exchange, the number of fatal traffic accidents per week in a given state, the number of radioactive particle emissions per unit of time, the number of meteorites that collide with a test satellite during a single orbit, the number of organisms per unit volume of some fluid, the number of defects per unit of some materials, the number of flaws per unit length of some wire, are some amongst others. Since the condition for the applications for Poisson distribution is the independence of events and the equality of mean and variance, this condition is rarely satisfied completely in biological and medical science due to the fact that the occurrences of successive events are dependent. Further, the negative binomial distribution is a possible alternative to the Poisson distribution when successive events are possibly dependent Johnson et al.,9 but for fitting negative binomial distribution (NBD) to the count data, mean should be less than the variance. In biological and medical sciences, these conditions are also not fully satisfied. Generally, the count data in biological science and medical science are either over-dispersed or under-dispersed. The main reason for selecting PLD and PSD to fit biological science data is that these two distributions are always over-dispersed and PSD has some flexibility over PLD.
Applications in ecology
Ecology is the branch of biology dealing with the relations and interactions between organisms and their environment, including other organisms. The organisms and their environment in the nature are complex, dynamic, interdependent, mutually reactive and interrelated. Ecology deals with the various principles which govern such relationship between organisms and their environment. It was Fisher et al.10 who have firstly discussed the applications of Logarithmic series distribution (LSD) to model count data in the science of ecology. Later, Kempton11 who fitted the generalized form of Fisher’s Logarithmic series distribution (LSD) to model insect data and concluded that it gives a superior fit as compared to ordinary Logarithmic series distribution (LSD). He also concluded that it gives better explanation for the data having exceptionally long tail. Tripathi & Gupta12 proposed another generalization of the Logarithmic series distribution (LSD) which is flexible to describe short-tailed as well as long-tailed data and fitted it to insect data and found that it gives better fit as compared to ordinary Logarithmic series distribution. Mishra & Shanker13 have discussed applications of generalized logarithmic series distributions (GLSD) to models data in ecology. Shanker & Hagos2 have tried to fit PLD for data relating to ecology and observed that PLD gives satisfactory fit.
In this section we have tried to fit Poisson distribution (PD), Poisson-Lindley distribution (PLD) and Poisson-Sujatha distribution (PSD) to many count data from biological sciences using maximum likelihood estimates. The data were on haemocytometer yeast cell counts per square, on European red mites on apple leaves and European corn borers per plant (Table 1-3).
It is obvious from above tables that both PSD and PLD give much closer fit than Poisson distribution. Further, in some data-sets PSD gives much closer fit than PLD while in some data-sets PLD gives much closer fit than PSD and thus both PSD and PLD can be considered as important tools for modeling data in ecology.
Number of Cells per Square |
Observed Frequency |
Expected Frequency |
PD |
PLD |
PSD |
0 |
128 |
118.1 |
127.4 |
127.5 |
1 |
37 |
54.3 |
41.1 |
40.9 |
2 |
18 |
|
|
|
3 |
3 |
4 |
1 |
5+ |
0 |
Total |
187 |
187 |
187 |
187 |
Estimate of Parameter |
|
=0.459893 |
=2.751579 |
=3.186657 |
|
|
9.9 |
1.43 |
0.99 |
d.f. |
|
1 |
1 |
1 |
p-value |
|
0.0016 |
0.2317 |
0.3197 |
Table 1 Observed and expected number of Haemocytometer yeast cell counts per square observed by ‘Student’17
Number Mites per Leaf |
Observed Frequency |
Expected Frequency |
PD |
PLD |
PSD |
0 |
38 |
25.3 |
35.8 |
35.3 |
1 |
17 |
29.1 |
20.7 |
20.9 |
2 |
10 |
16.7 |
11.4 |
11.6 |
3 |
9 |
|
6
|
6.1
|
4 |
3 |
5 |
2 |
6 |
1 |
7+ |
0 |
Total |
80 |
80 |
80 |
80 |
Estimate of Parameter |
|
=1.15 |
=1.255891 |
=1.64683 |
|
|
18.27 |
2.47 |
2.52 |
d.f. |
|
2 |
3 |
3 |
p-value |
|
0.0001 |
0.4807 |
0.4719 |
Table 2 Observed and expected number of red mites on Apple leaves
Number of Bores per Plant |
Observed Frequency |
Expected Frequency |
PD |
PLD |
PSD |
0 |
188 |
169.4 |
194.0 |
193.6 |
1 |
83 |
109.8 |
79.5 |
79.6 |
2 |
36 |
35.6 |
31.3 |
31.6 |
3 |
14 |
|
|
|
4 |
2 |
5 |
1 |
Total |
324 |
324.0 |
324.0 |
324.0 |
Estimate of parameter |
|
|
|
|
|
|
15.19 |
1.29 |
1.16 |
d.f. |
|
2 |
2 |
2 |
p-value |
|
0.0005 |
0.5247 |
0.5599 |
Table 3 Observed and expected number of European corn- borer of Mc Guire et al18
It is obvious from above tables that in table 1, PD gives better fit than PLD and PSD; in table 2 PLD gives better fit than PD and PSD while in table 3, PSD gives better fit than PD and PLD
Application in genetics
Genetics is the branch of biological science which deals with heredity and variation. Heredity includes those traits or characteristics which are transmitted from generation to generation, and is therefore fixed for a particular individual. Variation, on the other hand, is mainly of two types, namely hereditary and environmental. Hereditary variation refers to differences in inherited traits whereas environmental variations are those which are mainly due to environment. The segregation of chromosomes has been studied using statistical tool, mainly chi-square (
). In the analysis of data observed on chemically induced chromosome aberrations in cultures of human leukocytes, Loeschke & Kohler14 suggested the negative binomial distribution while Janardan & Schaeffer15 suggested modified Poisson distribution. Mishra and Shanker13 have discussed applications of generalized Logarithmic series distributions (GLSD) to model data in mortality, ecology and genetics. Shanker & Hagos2 have detailed study on the applications of PLD to model data from genetics. Much quantitative works seem to be done in genetics but so far no works has been done on fitting of PSD to data relating to genetics. In this section an attempt has been made to fit to data relating to genetics using PSD, PLD and PD using maximum likelihood estimate. Also an attempt has been made to fit PSD, PLD, and PD to the data of Catcheside et al.16 in Table 4-7.
It is obvious from the fitting of PSD, PLD, and PD that both PSD and PLD gives much satisfactory fit than PD while in some data-sets PSD gives much closer fit than PLD whereas PLD gives much closer fit than PSD in some data-sets. Thus both PSD and PLD can be considered as important tools for modeling data in genetics
Number of Aberrations |
Observed Frequency |
Expected Frequency |
PD |
PLD |
PSD |
0 |
268 |
231.3 |
257 |
257.6 |
1 |
87 |
126.7 |
93.4 |
93 |
2 |
26 |
34.7 |
32.8 |
32.7 |
3 |
9 |
|
11.2
|
11.2
|
4 |
4 |
5 |
2 |
6 |
1 |
7+ |
3 |
Total |
400 |
400 |
400 |
400 |
Estimate of Parameter |
|
=0.5475 |
=2.380442 |
=2.829241 |
|
|
38.21 |
6.21 |
6.28 |
d.f. |
|
2 |
3 |
3 |
p-value |
|
0 |
0.1018 |
0.0987 |
Table 4 Distribution of number of Chromatid aberrations (0.2 g chinon 1, 24 hours)
Class/Exposure
|
Observed Frequency |
Expected Frequency |
PD |
PLD |
PSD |
0 |
413 |
374 |
405.7 |
406.1 |
1 |
124 |
177.4 |
133.6 |
132.9 |
2 |
42 |
42.1 |
42.6 |
42.7 |
3 |
15 |
|
13.3
|
13.4
|
4 |
5 |
5 |
0 |
6 |
2 |
Total |
601 |
601 |
601 |
601 |
Estimate of parameter |
|
=0.47421 |
=2.685373 |
=3.125788 |
|
|
48.17 |
1.34 |
1.1 |
d.f. |
|
2 |
3 |
3 |
p-value |
|
0 |
0.7196 |
0.7771 |
Table 5 Mammalian cytogenetic dosimetry lesions in rabbit lymphoblast induced by streptonigrin (NSC-45383), Exposure -60
Class/Exposure
|
Observed Frequency |
Expected Frequency |
PD |
PLD |
PSD |
0 |
200 |
172.5 |
191.8 |
192 |
1 |
57 |
95.4 |
70.3 |
70.1 |
2 |
30 |
26.4 |
24.9 |
24.9 |
3 |
7 |
|
|
|
4 |
4 |
5 |
0 |
6 |
2 |
Total |
300 |
300 |
300 |
300 |
Estimate of parameter |
|
=0.55333 |
=2.353339 |
=2.795745 |
|
|
29.68 |
3.91 |
3.81 |
d.f. |
|
2 |
2 |
2 |
p-value |
|
0 |
0.1415 |
0.1488 |
Table 6 Mammalian cytogenetic dosimetry lesions in rabbit lymphoblast induced by streptonigrin (NSC-45383), Exposure -70
Class/Exposure
|
Observed Frequency |
Expected Frequency |
PD |
PLD |
PSD |
0 |
155 |
127.8 |
158.3 |
157.5 |
1 |
83 |
109 |
77.2 |
77.5 |
2 |
33 |
46.5 |
35.9 |
36.4 |
3 |
14 |
|
16.1
|
16.4
|
4 |
11 |
5 |
3 |
6 |
1 |
Total |
300 |
300 |
300 |
300 |
Estimate of parameter |
|
=0.853333 |
=1.617611 |
=2.034077 |
|
|
24.97 |
1.51 |
1.74 |
d.f. |
|
2 |
3 |
3 |
p-value |
|
0 |
0.6799 |
0.6281 |
Table 7 Mammalian cytogenetic dosimetry lesions in rabbit lymphoblast induced by streptonigrin (NSC-45383), Exposure -90
Acknowledgments
Conflicts of interest
Author declares that there are no conflicts of interest.
References
- Shanker R. The discrete Poisson–Sujatha distribution. International Journal of Probability and Statistics. 2016;5(1).
- Shanker R, Hagos F. On Poisson–Lindley distribution and Its applications to Biological Sciences. Biometrics and Biostatistics International Journal. 2015;2(4):1–5.
- Sankaran M. The discrete Poisson–Lindley distribution. Biometrics. 1970;26(1):145–149.
- Shanker R. Sujatha distribution and Its Applications. Statistics in Transition new Series. 2015.
- Shanker R, Hagos F. Size–biased Poisson–Sujatha distribution with Applications. Communicated. 2016.
- Shanker R, Hagos F. Zero–truncated Poisson–Sujatha distribution with Applications. Communicated. 2016.
- Shanker R, Hagos F. On zero–truncation of Poisson, Poisson–Lindley, and Poisson–Sujatha distribution and their Applications. Communicated. 2016.
- Lindley DV. Fiducial distributions and Bayes theorem. Journal of the Royal Statistical Society. 1958;20(1):102–107.
- Johnson NL, Kotz S, Kemp AW. Univariate Discrete Distributions 2nd edition John Wiley & sons Inc, USA. 1992.
- Fisher RA, Corpet AS, Williams CB. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology. 1943;12(1):42–58.
- Kempton RA. A generalized form of Fisher’s logarithmic series. Biometrika. 1975;62(1):29–38.
- Tripathi RC, Gupta RC. A generalization of the log–series distribution. Theory and Methods. 1985;14(8):1779–1799.
- Mishra A, Shanker R. Generalized logarithmic series distribution–Its nature and applications. Proceedings of the Vth International Symposium on Optimization and Statistics. 2002;28–30:155–168.
- Loeschke V, Kohler W. Deterministic and Stochastic models of the negative binomial distribution and the analysis of chromosomal aberrations in human leukocytes. Biometrische Zeitschrift. 1976;18(6):427–451.
- Janardan KG, Schaeffer DJ. Models for the analysis of chromosomal aberrations in human leukocytes. Biometrical Journal. 1977;19(8):599–612.
- Catcheside DG, Lea DE, Thoday JM. Types of chromosome structural change induced by the irradiation on Tradescantia microspores. J Genet. 1946;47:113–136.
- Sankaran M. The discrete Poisson–Lindley distribution. Biometrics. 1970;26(1):145–149.
- Mc Guire JU, Brindley TA, Bancroft TA. The distribution of European corn–borer larvae pyrausta in field corn. Biometrics. 1957;13(1):65–78.
©2016 Shanker, et al. This is an open access article distributed under the terms of the,
which
permits unrestricted use, distribution, and build upon your work non-commercially.