Research Article Volume 4 Issue 7
Inference for zero inflated truncated power series family of distributions
MK Patil
Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.
Padmabhushan Vasantraodada Patil Mahavidyalaya, India
Correspondence: MK Patil, Padmabhushan Vasantraodada Patil Mahavidyalaya, Kavathe Mahankal, Dist. Sangli, India
Received: August 14, 2016 | Published: December 6, 2016
Citation: Patil MK. Inference for zero inflated truncated power series family of distributions. Biom Biostat Int J. 2016;4(7):119-122. DOI: 10.15406/bbij.2016.04.00115
Download PDF
Abstract
Zero-inflated data indicates that the data set contains an excessive number of zeros. The word zero-inflation is used to emphasize that the probability mass at the point zero exceeds than the one allowed under a standard parametric family of discrete distributions. Gupta et al.,1 Murat & Szynal,2 Patil & Shirke3 have contributed to estimation and testing of the parameters involved in Zero Inflated Power Series Distributions. If the data set under study does not contain observations after some known point in the support, we have to modify Zero Inflated Power Series Distribution (ZIPSD) accordingly in order to get better inferential properties. Zero Inflated Truncated Power Series Distribution (ZITPSD) is one of the better options. In the present work we address problem of estimation for ZITPSD with more emphasis on statistical tests. We provide three asymptotic tests for testing the parameter of ZITPSD, using an unconditional (standard) likelihood approach, a conditional likelihood approach and the sample mean, respectively. The performance of first two tests has been studied for Zero Inflated Truncated Poisson Distribution (ZITPD). Asymptotic Confidence Intervals for the parameter are also provided. The model has been applied to a real life data.
Keywords: zero inflation, zero inflated power series distribution, zero inflated truncated power series distribution, zero inflated truncated poisson distribution
Introduction
In certain applications involving discrete data, we come across data having frequency of an observation ‘zero’ significantly higher than the one predicted by the assumed model. The problem of high proportion of zeros has been an interest in data analysis and modeling. There are many situations in the medical field, engineering applications, manufacturing, economics, public health, road safety epidemiology and in other areas leading to similar situations. In highly automated production process, occurrence of defects is assumed to be Poisson. However, we get no defectives in many samples. This leads to excess number of zeros. Models having more number of zeros significantly are known as zero-inflated models.
In the literature, numbers of researchers have worked on family of zero-inflated power series distributions. Gupta et al.1 have studied the structural properties and point estimation of parameters of Zero-Inflated Modified Power Series distributions and in particular for zero-inflated Poisson distribution. Murat & Szynal2 have studied the class of inflated modified power series distributions where inflation occurs at any of the support points. Moments, factorial moments, central moments, the maximum likelihood estimators and variance-covariance matrix of the estimators are obtained. Murat & Szynal2 extended the results of Gupta et al.1 to the discrete distributions inflated at any point
.
Zero Inflated Truncated Power Series Distribution contains two parameters. The first parameter indicates inflation (
) of zero and the other parameter (
) is that of power series distribution. Literature survey reveals that many researchers devoted to the inflation parameter of the model. In the present study, we focus on the referential aspect of the basic parameter of the model. In this article, we provide maximum likelihood parameters, Fisher information and asymptotic tests for testing the parameter of the Zero Inflated Truncated Power Series Distribution. Additionally, asymptotic confidence interval for the parameter is provided.
In section 2.1 we report estimation of both the parameters of ZITPSD and corresponding asymptotic variances using full likelihood approach, conditional likelihood approach and method of moments. In section 2.2, we provide three asymptotic tests for testing the parameter of ZITPSD. Section 2.3 is devoted to asymptotic confidence intervals for the parameters of ZITPSD. In section 3.1 we report estimation of parameters involved in Zero Inflated Truncated Poisson Distribution (ZITPD) and inference related to the model. Section 3.2 is devoted to three asymptotic tests for testing the parameter of ZITPD and in section 3.3 we provide asymptotic confidence intervals for the parameters of ZITPD. Simulation study is carried out in section 4, to study the performance of the tests. Illustrative example is provided in section 5.
Zero-inflated truncated power series distribution(ZITPSD)
Before we define truncated ZIPSD, we first consider the Truncated Power Series Distribution (TPSD) truncated at the support point
onwards, where
is known. Then the probability mass function of TPSD is given by
where
It is clear that the truncated distribution is also Power series distribution. Based on the same, we define ZITPSD as follows:
Let the probability mass function of a random variable X is given by
…(2.1)
where
Estimation of
and
Estimation of the parameters using full likelihood function: Suppose a random sample
of size n from ZITPSD is available. Then the likelihood function is given b
where
if
and
if
1,2,3,….t. …(2.2)
then,
…(2.3)
Maximum likelihood estimators of
and
are obtained by solving the following two equations
…(2.4)
, …(2.5)
Substituting
in eq. (2.5) we get
, …(2.6)
which is non-linear equation in
, Using Newton-Raphson method first we find
, substituting this value of
in Eq. (2.4) we find
. The Fisher information matrix of
is given by
Where
, …(2.7)
…(2.8)
and
…(2.9)
Assuming that conditions required for asymptotic normality for maximum likelihood estimators are satisfied, we have following theorem:
Theorem 2.1: Let
be a random sample from ZITPSD with parameters
and
. Then the maximum likelihood estimator obtained by solving eq. (2.4) and eq. (2.6), have asymptotic bivariate normal distribution with mean vector
and dispersion matrix
for
sufficiently large.
That is as
,
.
In the following we present conditional likelihood approach and obtain MLEs for
.
Conditional likelihood function approach: We observe that the conditional density of
given
is independent of inflation parameter
, since
…(2.10)
Now the conditional log likelihood function is given by
…(2.11)
The mle
of
is the solution to an equation
, …(2.12)
where
is the mean of the positive observations only. We note that mle of
based on full likelihood (eq. 2.6) and based on conditional likelihood (Eq. 2.12) are the same and
…(2.13)
Assuming that Cramer-Huzurbazar conditions required for asymptotic normality for MLEs are satisfied, we have following theorem:
Theorem 2.2: Let
be a random sample from ZITPSD with parameters
and
. Then the mle of
is solution to the eq. (2.12) and has asymptotic normal distribution with mean
and variance
for
sufficiently large. That is as
,
.
In the following we present moment estimator of ZITPSD.
Moment estimator of ZITPSD: We have,
and
,
say.
Let,
…(2.14)
, …(2.15)
Solving eq. (2.14) and eq. (2.15) we get moment estimators of
and
.
Theorem 2.3: Let
be a random sample from ZITPSD with parameters
and
. Then the moment estimator of
and
are obtained by solving in the eq. (2.14) and eq. (2.15). The moment estimator of
has asymptotic normal distribution with mean
and variance
, for
sufficiently large. That is as
,
.
Tests for the parameter
of ZITPS distribution
Test based on
: Suppose we wish to test
vs
. Let us assume that
is known. Therefore, under
, from Theorem (2.1) we have
~
. …(2.16)
Define a test statistic to be
. Based on
we define the test
which rejects
at α level of significance, if
, where
is the upper
th percentile of SNV.
Let
be the cumulative distribution function of SNV. Then the power of the test
is given by
,
where
and
.
However, in practice
is unknown. Hence we modify the test statistic by replacing
by its maximum likelihood estimator (
), when
is true. By doing so, we define test
, where
Based on
, we propose a test
rejects
at
level of significance, if
.
The power of this test is given by
, ...(2.17)
where
,
,
,with
. …(2.18)
Below we develop test based on
, estimator based on conditional likelihood approach.
Test based on
: Theorem (2.5) gives
~
. …(2.19)
Hence, we define test statistic
. A test based on
which rejects
α level of significance, if
.
The power of the test
is given by
, …(2.20)
where,
,
.
Test based on the moment estimator
of
: It is clear that the problem of testing
vs
is equivalent to testing
vs
, where
. We have from Theorem (2.3), sample mean is consistent and asymptotically normal for the population mean.
That is
~
.
Therefore, under
, we have
~
.
Define test statistic
~
, when
is known.
The test
rejects
at α level of significance if
.
That is, reject
if
.
The power of the test
is given by
, …(2.21)
where
and
If
is unknown, we modify the test statistic by replacing
by its estimate
under
. By doing so, we define test statistic
, …(2.22)
where
is given by
.
Based on
we propose a test
which rejects
at α level of significance if
.
The power of the test is given by
, …(2.23)
where
and
, with
.
Using the tests developed above, we can define two sided asymptotic confidence intervals for
, by inverting acceptance regions of the tests appropriately. Below we report the same.
Asymptotic confidence interval for the parameter
Asymptotic confidence interval for
based on the test
is given by
…(2.24)
where,
is an estimate of asymptotic variance of
and asymptotic confidence interval for θ based on the test
is given by
…(2.25)
where
is an estimate of the asymptotic variance of
as given in the eq. (2.13) .
Asymptotic confidence interval for
based on the test
is given by
, …(2.26)
where
=
.
In the following we study inference for zero-inflated truncated poisson distribution using results reported in the earlier.
Zero-inflated truncated poisson distribution
Truncated samples from discrete distributions arise in numerous situations where counts of zero are not observed. As an example, consider the distribution of the number of children per family in developing nations, where records are maintained only if there is at least a child in the family. The number of childless families remains unknown. The resulting sample is thus truncated with zero class missing. In continuous distribution, a sample of this type would be described as singly left truncated. In other situations, sample from discrete distributions might be censored on the right.
In this section, we consider zero-inflated truncated Poisson distribution truncated at right at the support point
onwards, where
is known. Moments, maximum likelihood estimators, Fisher information matrix for full and conditional likelihood are provided. We provide three tests for testing the parameter of the ZITPD.
Consider the probability mass function of truncated Poisson distribution (TPD) truncated at the support point
onwards. The probability mass function of TPD is given by
where
Using this truncated distribution, we define the zero-inflated truncated Poisson distribution truncated at
onwards.
The probability mass function of ZITP distribution is given by
and
…(3.1)
Estimation of the parameters
and
Estimation of the parameters using full likelihood function
Let
be a random sample observed from zero-inflated truncated Poisson distribution truncated at
onwards, where
is the point in the support defined in the above probability mass function. Then the likelihood function is given by
The corresponding log likelihood function is given by
…(3.2)
To find MLEs of
and
, we differentiate the eq. (3.2) with respective
and
, and then equating to zero we get
…(3.3)
and
…(3.4)
Substituting
in the above equation we have
,
, …(3.5)
which is non-linear equation in
. Therefore, we use a numerical technique to solve it. Let
and
.
Using Newton-Raphson iterative formula
with suitable initial value of
we get
. Substituting this value of
in eq. (3.3), we get the value of
.
In the following we find the elements of Fisher information matrix
Here we have
,
,
,
,
,
. …(3.6)
Now
,
,
,
…(3.7)
Further differentiating eq. (3.2) twice with respect to
, we get
.
Therefore,
.
Hence,
.
The asymptotic variance of
and
are
.
. … (3.8)
- Conditional likelihood function approach
The conditional likelihood function is given by
…(3.9)
The corresponding log likelihood function is given by
…(3.10)
The corresponding mle
is the solution to an equation
…(3.11)
Now consider,
. … (3.12)
Therefore, asymptotic variance of
is different than the asymptotic variance of estimate of
based on the standard likelihood approach. The same is given by
… (3.13)
- Moment estimator of ZITP distribution
Mean =
…(3.14)
say …(3.15)
…(3.16)
…(3.17)
Solving eq. (3.16) and eq. (3.17), we get moment estimators of
and
.
Tests for the parameter
of ZITP distribution
Suppose we want to test
vs
, (assuming
is unknown) [4]
- Test based on
…(3.18)
where
is defined in eq. (3.8).The test
rejects
, if
.
- Test based on
The test statistic here is
, …(3.19)
Where,
is as defined in eq. (3.13). The test
rejects
if
.
- Test based on sample mean
The test statistic
, …(3.20)
where
Power of the test is given by
where ,
,
and
, with
Asymptotic confidence interval for the parameter
Asymptotic confidence interval for
based on the test
is given by
…(3.21)
where,
is an estimate of asymptotic variance of
and asymptotic confidence interval for q based on the test
is given by
…(3.22)
where
is an estimate of the asymptotic variance of
as given in the eq. (3.13) .
Asymptotic confidence interval for
based on the test
is given by
, …(3.23)
where
=
.
Simulation study
A simulation study is carried out to investigate the power of the two tests proposed in section 3.2. We generate 10000 samples of sizes 100 and 200 for different values of p , θ and truncation point t. Based on generated sample, the test statistics were calculated. Percentage of times the test statistics exceeds Z1-a/2 is computed, which is an estimate of power of the respective test. R programme is developed to find power of the test. The results for the case of θ0=2 and 4 , p=0.3, 0.4, 0.5, 0.6, 0.7, a=0.05 and truncation point t= 7 and 9 are presented in the Table 1 & Table 2.
π |
θ |
n=100 |
n=200 |
|
|
|
|
0.3 |
2.0 |
6.57 |
4.28 |
6.57 |
4.63 |
2.2 |
11.49 |
8.9 |
16.08 |
12.88 |
2.4 |
26.85 |
22.24 |
45.08 |
39.93 |
2.6 |
49.43 |
44.5 |
76.81 |
72.82 |
2.8 |
71.27 |
66.99 |
93.49 |
91.72 |
3 |
86.28 |
83.23 |
98.91 |
98.46 |
3.2 |
94.64 |
93.18 |
99.77 |
99.73 |
3.4 |
98.08 |
97.58 |
99.99 |
99.99 |
3.6 |
99.44 |
99.08 |
100 |
100 |
3.8 |
99.8 |
99.76 |
100 |
100 |
4 |
99.95 |
99.94 |
100 |
100 |
4.2 |
99.98 |
99.97 |
100 |
100 |
4.4 |
100 |
100 |
100 |
100 |
0.4 |
2 |
6.44 |
4.24 |
6.29 |
4.46 |
2.2 |
12.83 |
10.33 |
20.01 |
15.59 |
2.4 |
33.16 |
28.94 |
56.94 |
50.24 |
2.6 |
60.11 |
55.4 |
87.34 |
83.64 |
2.8 |
81.87 |
78.72 |
97.82 |
97 |
3 |
93.8 |
92.31 |
99.83 |
99.79 |
3.2 |
98.35 |
97.87 |
100 |
100 |
3.4 |
99.62 |
99.54 |
100 |
100 |
3.6 |
99.98 |
99.96 |
100 |
100 |
3.8 |
99.99 |
99.97 |
100 |
100 |
3.8 |
100 |
100 |
100 |
100 |
0.5 |
2 |
6.17 |
4.46 |
6.25 |
4.2 |
2.2 |
14.83 |
12.01 |
24.63 |
19.24 |
2.4 |
40.31 |
34.76 |
66.99 |
60.39 |
2.6 |
70.38 |
65.06 |
92.88 |
90.37 |
2.8 |
90.13 |
87.37 |
99.52 |
99.14 |
3 |
97.23 |
96.45 |
99.99 |
99.97 |
3.2 |
99.55 |
99.36 |
100 |
100 |
3.4 |
99.96 |
99.94 |
100 |
100 |
3.6 |
100 |
99.99 |
100 |
100 |
3.8 |
100 |
100 |
100 |
100 |
0.6 |
2 |
6.8 |
4.43 |
7.12 |
4.89 |
2.2 |
18.04 |
13.52 |
28.35 |
21.91 |
2.4 |
47.01 |
40.71 |
73.41 |
65.85 |
2.6 |
77.65 |
72.17 |
96.33 |
94.68 |
2.8 |
94.05 |
91.78 |
99.86 |
99.74 |
3 |
99.01 |
98.48 |
99.99 |
99.99 |
3.2 |
99.85 |
99.74 |
100 |
99.99 |
3.4 |
99.99 |
99.97 |
100 |
100 |
3.6 |
100 |
100 |
100 |
100 |
0.7 |
2 |
7.11 |
4.17 |
7.34 |
4.95 |
2.2 |
19.69 |
14.15 |
32.46 |
24.21 |
2.4 |
54.29 |
45.76 |
80.95 |
73.64 |
2.6 |
84.17 |
78.59 |
98.35 |
97.26 |
2.8 |
96.91 |
95.1 |
99.95 |
99.9 |
3 |
99.65 |
99.28 |
100 |
100 |
3.2 |
99.99 |
99.97 |
100 |
100 |
3.4 |
100 |
100 |
100 |
100 |
Table 1 Power (in %) of the test
and
for
=2. t=7, n=100 and 200, α=0.05
π |
n=100 |
n=100 |
n=200 |
|
|
|
|
0.3 |
4 |
5.56 |
3.58 |
4.65 |
3.37 |
4.2 |
9.33 |
4.71 |
12.38 |
5.58 |
4.4 |
19.4 |
9.95 |
31.31 |
17.04 |
4.6 |
33.8 |
20.81 |
56.28 |
38.36 |
4.8 |
50.58 |
35.48 |
78.37 |
62.91 |
5 |
68.14 |
53.07 |
92 |
82.84 |
5.2 |
80.88 |
67.97 |
97.5 |
93.47 |
5.4 |
89.97 |
81.06 |
99.45 |
98.4 |
5.6 |
95.31 |
89.59 |
99.83 |
99.53 |
5.8 |
97.77 |
94.74 |
99.97 |
99.94 |
6 |
99.05 |
97.48 |
100 |
99.99 |
6.2 |
99.6 |
98.72 |
100 |
100 |
6.4 |
99.85 |
99.5 |
100 |
100 |
0.4 |
4 |
5.29 |
3.57 |
5.26 |
3.95 |
4.2 |
10.24 |
4.86 |
13.8 |
5.8 |
4.4 |
22.52 |
12.32 |
38.38 |
21.74 |
4.6 |
41.49 |
26.14 |
68.09 |
49.91 |
4.8 |
62.45 |
46.12 |
88.35 |
76.97 |
5 |
78.69 |
65.52 |
97.45 |
92.41 |
5.2 |
90.17 |
81.34 |
99.52 |
98.26 |
5.4 |
95.75 |
90.75 |
99.96 |
99.67 |
5.6 |
98.55 |
96.16 |
99.99 |
99.97 |
5.8 |
99.53 |
98.56 |
100 |
100 |
6 |
99.88 |
99.52 |
100 |
100 |
6.2 |
99.94 |
99.78 |
100 |
100 |
6.4 |
99.96 |
99.94 |
100 |
100 |
0.5 |
4 |
5.39 |
3.75 |
4.88 |
3.91 |
4.2 |
11.78 |
5.51 |
15.75 |
6.94 |
4.4 |
26.72 |
14.86 |
45.12 |
26.2 |
4.6 |
49.72 |
33.44 |
76.69 |
59.34 |
4.8 |
70.81 |
55 |
94.06 |
85.45 |
5 |
86.58 |
75.44 |
98.94 |
96.71 |
5.2 |
95.84 |
89.47 |
99.95 |
99.49 |
5.4 |
98.59 |
95.86 |
99.98 |
99.95 |
5.6 |
99.62 |
98.82 |
100 |
100 |
5.8 |
99.93 |
99.79 |
100 |
100 |
6 |
99.98 |
99.86 |
100 |
100 |
6.2 |
99.99 |
99.97 |
100 |
100 |
6.4 |
100 |
100 |
100 |
100 |
0.6 |
4 |
4.71 |
3.41 |
5.27 |
4.12 |
4.2 |
13.45 |
5.88 |
20.35 |
8.06 |
4.4 |
34.38 |
19.15 |
57.57 |
35.63 |
4.6 |
62.82 |
45.15 |
89.27 |
74.97 |
4.8 |
84.74 |
70.72 |
98.5 |
94.9 |
5 |
95.58 |
88.95 |
99.96 |
99.56 |
5.2 |
98.9 |
96.8 |
100 |
99.97 |
5.4 |
99.77 |
99.19 |
100 |
100 |
5.6 |
99.98 |
99.88 |
100 |
100 |
5.8 |
100 |
100 |
100 |
100 |
0.7 |
4 |
4.71 |
3.41 |
5.27 |
4.12 |
4.2 |
13.45 |
5.88 |
20.35 |
8.06 |
4.4 |
34.38 |
19.15 |
57.57 |
35.63 |
4.6 |
62.82 |
45.15 |
89.27 |
74.97 |
4.8 |
84.74 |
70.72 |
98.5 |
94.9 |
5 |
95.58 |
88.95 |
99.96 |
99.56 |
5.2 |
98.9 |
96.8 |
100 |
99.97 |
5.4 |
99.77 |
99.19 |
100 |
100 |
5.6 |
99.98 |
99.88 |
100 |
100 |
5.8 |
100 |
100 |
100 |
100 |
Table 2 Power (in %) of the test
and
for
=4, t=9, n=100 and 200, a=0.05
From the simulation study reported in Table 1 & Table 2, we observe that
- The test based on full likelihood approach is better than the one based on conditional likelihood approach when θ is small. For large θ, both the tests are equally good.
- Probability of Type I error of the former test is more than that of later.
- Since for large values of θ both the tests are equally good. We recommend the use of conditional likelihood approach, when θ is large, from the computational point of view.
- If θ is large, proportion of zeros corresponding the Poisson distribution are relatively low. Hence these zeros can be ignored while making inference about θ. However, for smaller values of θ, such ignorance will have effect on inference of θ.
Illustrative example
Let us consider the data of Traffic Accident Research given by Kuan et al.5
The data from the department of motor vehicles master driver license file
Traffic accidents 0 |
1 |
2 |
>3 |
|
Number of drivers |
4499 |
766 |
136 |
21 |
From the data we see that there is excess number of zero counts and the frequency of X is greater than or equal to 3 is 21. Generally such data is modeled by Poisson distribution. But Poisson distribution does not fit well for this data. We fit the above data for ZIPD. In ZIPD there are two parameters
and
. In this problem
and estimated values of
and
. Using these values we fit the ZIPD for the above data. The calculated Chi-square value is 0.4043 and table value of X2(1, 0.05) is 3.841459 and the P-value is 0.5249
Same data is fitted to ZITPD truncated at 4 and above. The parameters are
and
The calculated Chi-square value is 0.4018 and table value of X2(1, 0.05) is 3.8415 and the P-value is 0.5262. If the same data is fitted to ZITPD truncated at 5 and above. The parameters are
and
. The calculated Chi-square value is 0.4012 and table value of X2(1, 0.05) is 3.8415 and the P-value is 0.5265. Here we prefer ZITPD to model the data.
Acknowledgments
Conflicts of interest
Author declares that there are no conflicts of interest.
References
- Gupta PL, Gupta RL, Tripathi RC. Inflated Modified Power Series Distributions with Applications. Comm Statist Theory Meth. 1995;24(9):2355‒2374.
- Murat M, Szynal D. Non-Zero-Inflated Modified Power Series Distributions. Commun Statist.Theory Meth. 1998;27(12):3047‒3064.
- Patil MK, Shirke DT. Tests for equality of inflation parameters of two zero-inflated power series distributions. Commun Statist Theory Meth. 2011;40(14):2539‒2553.
- Patil MK, Shirke DT. Testing parameter of the power series distribution of a zero-inflated power series model. Statistical Methodology.2007;4(4):393‒406.
- KuanJ, Peck RC, Janke MK. Statistical Methods for Traffic Accidents Research. In proceeding of the 1990 Taipei Symposium in statistics, June 28-30, 1990, (Eds), by Min-Te Chao and Philip E Cheng Taipei, Institute of Statistical Science; 1991.
©2016 Patil. This is an open access article distributed under the terms of the,
which
permits unrestricted use, distribution, and build upon your work non-commercially.