Submit manuscript...
eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 6 Issue 1

Modified maximum likelihood estimation in poisson regression

Evrim Oral

Department of Biostatistics Program, LSUHSC School of Public Health, USA

Correspondence: Evrim Oral, Biostatistics Program, LSUHSC School of Public Health, 2020 Gravier Street, 3rd floor New Orleans, LA 70112, USA, Tel (504) 568-6094

Received: April 25, 2017 | Published: May 25, 2017

Citation: Oral E. Modified maximum likelihood estimation in poisson regression. Biom Biostat Int J. 2017;6(1):249-254. DOI: 10.15406/bbij.2017.06.00154

Download PDF

Abstract

In Generalized Linear Models, likelihood equations are intractable and do not have explicit solutions; thus, they must be solved by using Newton-type algorithms. Solving these equations by iterations, however, can be problematic: the iterations might converge to wrong values or the iterations might not converge at all. In this study, we derive the modified maximum likelihood estimators for Poisson regression model and study their properties. We also search the robustness of these estimators when there are outliers in the covariates.

Keywords: count data, poisson regression, modified maximum likelihood, newton-type algorithms, dixon’s outlier model

Introducton

Poisson regression is widely used for modeling count data, especially when there is no over- or under- dispersion.1 Since the likelihood equations from this model are intractable, solving these equations requires using iterative methods, such as Newton Raphson or Fisher scoring. However, using iterative methods to find maximum likelihood estimators (MLEs) can generally be problematic and time should be spent to investigate the stability of such solutions.2-4 Specifically the following difficulties can arise: the iterations might converge to wrong values if the likelihood equations have multiple roots, or the iterations might not converge at all. See5 and6 for a discussion about situations where one encounters these difficulties in solving MLEs. The common software, Stata for example, is known to be very sensitive to numerical iterations. Researchers have reported problems in getting Poisson regression estimates with the “poisson” command, which encounters problems in locating the maximum and does not converge.7 Note that in Poisson regression modeling, additional problems might occur. The most common problem is the over- or under- dispersion in data, in which case using a more flexible model such as negative binomial regression is more appropriate. The second problem is analogous to the complete separation or quasi-complete separation problem in binary regression: the MLEs may not exist for certain data configurations, see.7,8 In this study we do not consider either of these problems and focus only on the case where it is appropriate to model the data with Poisson regression.

Unlike the maximum likelihood (ML) technique, modified maximum likelihood (MML) methodology produces explicit estimates. MML achieves explicit estimates by linearizing intractable functions within the likelihood equations using the ordered statistics.9 Asymptotically, MML estimators (MMLEs) are known to be unbiased and have minimum variances, i.e. they are fully efficient. For small sample sizes MMLEs have negligible bias and their variances are only marginally bigger than the minimum variance bounds, i.e. they are highly efficient.10-15

In the GLM setting, Tiku and Vaughan3 used the MML methodology to extend the techniques of traditional logistic regression to non-logistic density functions. Oral and Gunay16 and Oral17 later extended the work in3 to the binary regression model with one stochastic covariate. Oral18 derived the MMLEs in general GLMs which use canonical link when there is only one risk factor. In this study, we derive the explicit MMLEs of Poisson regression model, generalize the derivations to more than one covariate, and study their robustness properties via simulations.

Methods

The Univariate Poisson regression model is given by

E(Yi|Xi=xi)=μiE(Yi|Xi=xi)=μi

=exp(zi)  (1)

where zi=α+βxi , for 1in , and the outcome Y has the probability distribution

fy(yi)=exp(μi)μyiyi!,yi=0,1,2,... (2)

Note that in equation (2), Y is presumed to increase with X so that b is a priori greater than zero. For the model given in (1)-(2), the log-likelihood function of the random sample (yi,xi)  can be written as

lnLni=1yizini=1g(zi) , (3)

where g(zi)=exp(zi) . The likelihood equations for estimating a and b do not have explicit solutions because of the nonlinear function g(zi) , 1in . To obtain the MMLEs, we first express the likelihood equations in terms of the ordered variates z(1)z(2)...z(n) . The likelihood equations can be re-written as

lnLα=ni=1{y[i]g(z(i))}=0  (4)

and

lnLβ=ni=1x(i){y[i]g(z(i))}=0  (5)

Where y[i]  is the concomitant of x(i)   (1in)  and g(z(i))=exp(z(i)) . Linearizing the intractable function g(zi)  by using the first two terms of its Taylor series expansion around the population quantiles t(i)=E(z(i))  ( 1in ), we find

g(z(i))ai+biz(i) , (6)

where ai=exp(t(i)){1t(i)}  and bi=exp(t(i))  for 1in . In order to calculate t(i)=E(z(i))  values, we define a dummy random variable U with the probability density function

f(u)=exp(u),u<0  (7)

and re-write the model (1) as

E(Yi|Xi=xi)=μi=F(zi) , (8)

where F(u)=exp(u) , u<0 . Thus, the t(i)  values can be obtained from the equation

t(i)=ln(i/(n+1)) , (9)

For 1in ; asymptotically z(i)t(i)0. Alternatively, when n is large one can utilize the standard normal distribution

t(i)=Φ1(i/(n+1)) . (10)

Incorporating (6) into (4)-(5) and solving the resulting modified likelihood equations yield the explicit MMLEs below:

ˆα=δmˆβˉxa  and ˆβ=ni=1δi(x(i)ˉxa)ni=1bi(x(i)ˉxa)2 , (10)

where

δ=ni=1δi , δi=y[i]ai , m=ni=1bi , (11)

and

ˉxa=(ni=1bxi(i))/m . (12)

The MMLEs derived above are asymptotically equivalent to their corresponding MLEs, giving them the same attractive asymptotic properties; however, one can refine the estimates by re-calculating ai  and bi  values by replacing the theoretical population quantiles with their estimated values t(i)=ˆα+ˆβx(i) (1in) . This process might be repeated until a desired convergence is met. The stabilization generally is reached within a few iterations.

Asymptotic variances and co-variances

Vaughan and Tiku15 proved rigorously that the MMLEs are asymptotically unbiased and their variances and co-variances are exactly the same as those of the MLEs. In the present situation, therefore, the asymptotic variances and the covariance of ˆα  and ˆβ  are given by I1(α,β) , where I is the Fisher Information matrix consisting of the elements E(2lnL*/α2), E(2lnL*/αβ) , and E(2lnL*/β2) . From the modified likelihood equations

lnLαlnL*α=ni=1{yi(ai+bizi)}=0 ,

lnLβlnL*β=ni=1xi{yi(ai+bizi)}=0 ,

the Fisher Information matrix can be easily obtained as

V=I1(γ0,γ1)=[ni=1Qini=1Qixini=1Qixini=1Qix2i]1 , (13)

where Qi=exp(zi) . V is estimated by replacing Qi  with its estimate ˆQi=exp(ˆzi) , ˆzi=ˆα+ˆβxi  ( 1in ). Since zi  values converge to ti  values as n tends to infinity, ai  and bi  values are treated as constant coefficients for large n, see also.3 Hence, the asymptotic variances can be estimated by

Var(ˆα)=ni=1ˆQix2i/{ni=1ˆQini=1ˆQix2i(ni=1ˆQixi)2} , (14)

Var(ˆβ)=ni=1ˆQi/{ni=1ˆQini=1ˆQix2i(ni=1ˆQixi)2} . (15)

Hypothesis testing

Testing the null hypothesis H0:β=0  is of great practical importance in Poisson regression modelling. The likelihood ratio statistic for testing H0  is LR=2(L0L1) , where L0  and L1  denote the maximized log-modified likelihood functions under the null and alternative hypotheses, respectively. The null distribution of LR is asymptotically a chi-square with 1 degree of freedom. Large values of LR lead the rejection of H0 . Alternatively, the Wald statistic W (the ratio of ˆβ  to its standard error) might be used. Since ˆβ  is asymptotically equivalent to the MLE, the null distribution of W is asymptotically normal N(0,1). Large values of W lead to the rejection of H0 .

Numerical example

To compare ML and MML estimates numerically, we analyzed the data given on page 82 of Agresti.19 The data is from a study of nesting horseshoe crabs where the response Y is the number of satellites that each female crab has, and the corresponding values of the covariate X is the carapace width of 173 crabs. The study investigates the relationship between Y and X. We calculated the MMLEs from equations (10)-(12) and their approximate standard errors from (14)-(15). The FORTRAN code written to carry out the calculations can be obtained from the author. Our results are completely consistent with those given in,19 which is expected; see Table 1.

Coefficient

Estimate

SE

W

LR

ML

a

-3.3048

0.5422

b

0.164

0.02

8.2

64.9

Coefficient

Estimate

SE

W

LR

MML

a

-3.3047

0.5423

b

0.164

0.0199

8.241

64.91

Table 1 MLEs and MMLEs along with their standard errors for horseshoe crab data

Remark: In solving (10)-(12), Oral18 proposed to calculate the initial t(i)  values from the least squares estimators (LSEs), which is a different approach than using equation (9) (Approach 1) or equation (10) (Approach 2). Since t(i)=E(α+βx(i)) , t(i)  values can be approximated by ˜t(i)=˜α+˜βx(i) , where

˜α=ˉy˜βˉx  and ˜β=ni=1(xiˉx)yi/ni=1(xiˉx)2

are the LSEs; see also3 and.20 When using this approach (say, Approach 3), the t(i)  values need to be revised after the first iteration with their estimated values t(i)=ˆα+ˆβx(i) (1in)  as described above. Estimating population quantiles from the LSEs changes neither the derivations/solutions (10)-(12) nor the results. However, the total revision number needed for stabilization under different approaches is not the same, see also3 page 889. To compare the performance of these three approaches, we conducted a simulation study where we calculated the bias values and variances of the resulting MLEs as well as the coverage probabilities. We also provided average revision numbers needed for stabilization from each approach. We set α  to zero and considered various values for β  and sample size n. Our results from 10,000 Monte Carlo runs are given in Table 2

As can be seen from Table 2, all approaches provide same biases and variances after stabilization, which is expected. The resulting coverage probabilities from all approaches are close to 0.95, and as sample size increases, both bias values and variances decrease, also as expected. For a given (α,β)  value, it can be seen that the fastest stabilization is achieved by Approach 2 (i.e. equation (10)). Thus, although all three approaches yield the same results, we suggest to calculate initial t(i)  values (1in)  from equation (10) because the stabilization from this approach is the fastest one.

 

 

β

0.1

 

0.5

 

1.0

 

n

 

 

ˆα

ˆβ

ˆα

ˆβ

ˆβ

30

Approach 1

Bias

0.0355

0.0006

0.0341

0.0067

0.0278

0.0078

 

Variance

0.0373

0.0389

0.0403

0.0369

0.0451

0.0328

 

Coverage prob.

0.9521

0.9537

0.9522

0.9541

0.9525

0.9488

 

No of revisions

4.67

 

3.45

 

6.10

 

Approach 2

Bias

0.0355

0.0006

0.0341

0.0067

0.0279

0.0079

 

Variance

0.0373

0.0388

0.0403

0.0369

0.0451

0.0328

 

Coverage prob.

0.9521

0.9537

0.9522

0.9541

0.9525

0.9488

 

No of revisions

2.65

 

2.00

 

1.81

 

Approach 3

Bias

0.0355

0.0006

0.0341

0.0067

0.0278

0.0079

 

Variance

0.0372

0.0389

0.0402

0.0368

0.0451

0.0328

 

Coverage prob.

0.9521

0.9537

0.9523

0.9540

0.9525

0.9488

 

No of revisions

2.98

 

3.04

 

4.63

 

100

Approach 1

Bias

0.0098

0.0007

0.0079

0.0002

0.0081

0.0017

 

Variance

0.0104

0.0104

0.0113

0.0094

0.0126

0.0074

 

Coverage prob.

0.9504

0.9518

0.9470

0.9494

0.9538

0.9495

 

No of revisions

4.78

 

3.24

 

6.59

 

Approach 2

Bias

0.0098

0.0007

0.0079

0.0001

0.0081

0.0016

 

Variance

0.0103

0.0104

0.0114

0.0095

0.0126

0.0073

 

Coverage prob.

0.9504

0.9518

0.9470

0.9494

0.9538

0.9496

 

No of revisions

2.91

 

2.01

 

1.32

 

Approach 3

Bias

0.0098

0.0007

0.0078

0.0002

0.0082

0.0017

 

Variance

0.0104

0.0104

0.0113

0.0094

0.0126

0.0073

 

Coverage prob.

0.9505

0.9518

0.9470

0.9494

0.9539

0.9495

 

No of revisions

2.99

 

3.00

 

4.69

 

250

Approach 1

Bias

0.0041

0.0002

0.0041

0.0000

0.0013

0.0007

 

Variance

0.0041

0.0040

0.0044

0.0036

0.0049

0.0026

 

Coverage prob.

0.9506

0.9452

0.9500

0.9479

0.9507

0.9513

 

No of revisions

4.78

 

3.11

 

6.87

 

Approach 2

Bias

0.0041

0.0002

0.0040

0.0000

0.0012

0.0007

 

Variance

0.0040

0.0041

0.0044

0.0036

0.0049

0.0026

 

Coverage prob.

0.9506

0.9452

0.9500

0.9479

0.9507

0.9513

 

No of revisions

2.90

 

2.00

 

1.12

 

Approach 3

Bias

0.0041

0.0002

0.0041

0.0001

0.0013

0.0007

 

Variance

0.0041

0.0041

0.0044

0.0036

0.0049

0.0026

 

Coverage prob.

0.9506

0.9452

0.9500

0.9479

0.9507

0.9513

 

No of revisions

2.99

 

3.00

 

4.76

 

Table 2 Bias values, variances, convergence probabilities and average revision numbers using three different approaches to calculate t(i)  values (1in)

Generalization to multivariable case

Now consider k (k2)  covariates and assume all of them take positive values without loss of generality. The Poisson regression model with k covariates can be written as

E(Yi|xi1,xi2,...,xik)=F(zi)  (16)

where F(zi)=exp(zi)  and

zi=β0+kj=1βjxij , (17)

for 1in. , 1jk . Following the same lines of,3 in order to rank the z-values we can assume that all covariates are equally effective in increasing the response Y, i.e. we initially take βj ’s all equal, and order the z-values that would correspond to the ordered x-values, where xi=xi1+xi2+...+xik (1in) . In other words, the ordered z-values become

z(i)=β0+β1x*i1+...+βkx*ik , (18)

where the vector [1x*i1x*i2..x*ik]  is the ith row of the matrix

X*=[1x*11x*12...x*1k1x*21x*22...x*2k..............1x*n1x*n2...x*nk] , (19)

which is constructed by arranging the rows of the X matrix

X=[1x11x12...x1k1x21x22...x2k..............1xn1xn2...xnk] ,

so as to correspond to the ordered x(i)  value (1in) . The MMLEs can be obtained along the same lines as in the Univariate case:

ˆΓ=(X*TMX*)1X*TΔ  (20)

where Δ=[δ1δ2..δn]T  , δi  is given by (11) and M is the nxn diagonal matrix

M=[b10...00b2...0..........00...bn] .

The asymptotic variance-covariance matrix V of the estimators can be derived from the Fisher information matrix V=I1(β0,β1,...,βk)  as given below

V=[QiQix1i...QixkiQix1iQix21i...Qixkix1i............QixkiQixkix1i...Qix2ki]1  (21)

where Qi=exp(zi) . V is estimated by replacing Qi  by

ˆQi=exp(ˆzi) , ˆz(i)=ˆβ0+ˆβ1x*i1+...+ˆβkx*ik   (1in) .

Robustness

Measures of influence considered in linear regression models, such as high leverage values, are analogous in the GLM framework. Large leverage values typically mean that there are outliers in covariates. When outliers present in the data, inferences based on MLEs becomes unreliable. In fact, it has been showed that MLEs are not robust in GLMs.21 In Poisson regression setting, if there are outliers in the continuous covariates, the estimates can be influenced. Thus, we also studied the robustness properties of the derived MMLEs under several outlier models. We considered the Univariate model given by (1) for simplicity. We assumed that α=0  and performed a Monte-Carlo study for different values of β , where (n-r) of the observations X1,X2,...,Xn  (we don’t know which) come from the Standard Normal Distribution with σ=1  and the remaining r observations come from the Normal distribution with a scale cσ  where c is a positive constant. We calculated the value of r from the equation r=[0.1n+0.5]  (Dixon’s outlier model). The outlier models considered are:

(a) (n-r) come from N(0,1) and r come from N(0,1) (No outlier situation),

(b) (n-r) come from N(0,1) and r come from N(0,1.5),

(c) (n-r) come from N(0,1) and r come from N(0,2),

(d) (n-r) come from N(0,1) and r come from N(0,4).

Note that the model (a) above does not involve outliers and is given for the sake of comparisons. In order to be able to make direct comparisons, after generating the X values we divided them by the standard deviation of the distribution for each model. After generating the X values, we calculated zi=α+βxi and μi=exp(zi)  for 1in  to generate Yi  values from Poisson distribution with mean μi  ( 1in ). The values obtained from 5000 runs are given in (Table 3).

 

 

Model (a): No outlier

Model (b)

β

n

Bias( ˆα )

Bias( ˆα )

Var( ˆα )

Var( ˆα )

Bias( ˆα )

Bias( ˆβ )

Var( ˆα )

Var( ˆβ )

0.1

30

0.0326

0.0075

0.0371

0.0385

0.0330

0.0021

0.0377

0.0388

 

50

0.0195

0.0004

0.0214

0.0219

0.0211

0.0033

0.0214

0.0218

 

100

0.0082

0.0039

0.0103

0.0102

0.0094

0.0008

0.0103

0.0104

0.2

30

0.0316

0.0020

0.0377

0.0388

0.0302

0.0034

0.0388

0.0386

 

50

0.0202

0.0029

0.0217

0.0215

0.0200

0.0007

0.0219

0.0212

 

100

0.0108

0.0000

0.0104

0.0102

0.0115

0.0002

0.0108

0.0103

0.4

30

0.0323

0.0033

0.0391

0.0376

0.0346

0.0012

0.0407

0.0385

 

50

0.0207

0.0005

0.0224

0.0209

0.0165

0.0004

0.0230

0.0209

 

100

0.0086

0.0004

0.0109

0.0097

0.0089

0.0009

0.0113

0.0099

 

 

Model (c)

Model (d)

β

n

Bias( ˆα )

Bias( ˆα )

Var( ˆα )

Var( ˆα )

Bias( ˆα )

Bias( ˆβ )

Var( ˆα )

Var( ˆβ )

0.1

30

0.02827

0.0003

0.0381

0.0413

0.0385

0.0026

0.0372

0.0477

 

50

0.0192

0.0013

0.0220

0.0220

0.0211

0.0004

0.0213

0.0246

 

100

0.0095

0.0001

0.0105

0.0104

0.0099

0.0002

0.0108

0.0107

0.2

30

0.0362

0.0036

0.0396

0.0419

0.0350

0.0080

0.0375

0.0467

 

50

0.0163

0.0014

0.0208

0.0211

0.0189

0.0057

0.0221

0.0242

 

100

0.0111

0.0010

0.0106

0.0105

0.0109

0.0013

0.0099

0.0103

0.4

30

0.0313

0.0015

0.0396

0.0370

0.0314

0.0017

0.0379

0.0461

 

50

0.0178

0.0030

0.0226

0.0207

0.0166

0.0001

0.0218

0.0223

 

100

0.0079

0.0003

0.0110

0.0093

0.0074

0.0039

0.0111

0.0088

Table 3 Empirical biases and variances from Dixon’s outlier models

As can be seen from the table, the biases in the estimates are negligible for all models. The variances Var(ˆβ)  (hence the Wald statistics W) are almost the same for a given n for the models (a), (b), (c) and (d), which means that the MMLEs are robust to outliers in the covariate. Note that the MML methodology achieves robustness through the t(i)  ( 1in ) values.

Conclusion

Poisson regression serves as a useful technique to model count data. The MLEs in Poisson regression are obtained via Newton-type algorithms; however these algorithms might not converge or converge to inaccurate values. In this study we derived the explicit MMLEs for Poisson regression. We also considered the case where there are outliers in the (continuous) covariate, which generally is the case in real life applications, and searched the properties of the derived MMLEs under several data violations. Although the scope of the simulations reported here is limited, we can conclude that MML methodology provides robust estimation in Poisson regression.

Acknowledgement

We are grateful to the reviewer for the helpful comment which helped to improve this manuscript.

Conflicts of interest

None.

References

  1. Cameron AC, Trivedi PK. Regression Analysis of Count Data. Cambridge Univ. Press, NY, USA. 1998.
  2. Casella G, Berger R. Statistical Inference. (2nd edn), Thomson Learning, Pacific Grove, CA, USA. 2002.
  3. Tiku ML, Vaughan DC. Logistic and nonlogistic density functions in binary regression with nonstochastic covariates. Biometrical Journal. 1997;39(8):883‒898.
  4.  Vaughan DC. The generalized secant hyperbolic distribution and its properties. Communications in Statistics ‒ Theory and Methods. 2002;31(2):219‒238.
  5. Barnett VD. Evaluation of the maximum likelihood estimator when the likelihood equation has multiple roots. Biometrika. 1966;53(1):151‒165.
  6. Lee KR, Kapadia CH, Dwight BB. On estimating the scale parameter of Rayleigh distribution from censored samples. Statistische Hefte. 1980;21(1):14‒20.
  7. Silva JMC, Tenreyro S. Poisson: Some convergence Issues. The Stata Journal. 2011;11(2):207‒212.
  8. Silva JMC, Tenreyro S. On the Existence of the Maximum Likelihood Estimates in Poisson Regression. Economics Letters. 2010;107:310‒312.
  9. Tiku ML. Estimating the mean and standard deviation from a censored normal sample. Biometrika. 1967;54(1):155‒165.
  10. Bhattacharyya GK. The asymptotics of maximum likelihood and related estimators based on type II censored data. Journal of the American Statistical Association. 1985;80:398‒404.
  11. Tan WY. On Tiku’s robust procedure‒a Bayesian insight. Journal of Statistical Planning and Inference. 1985;11(3):329‒340.
  12. Tiku ML, Tan WY, Balakrishnan N. Robust Inference. Marcel Dekker, New York, USA. 1986.
  13. Tiku ML, Suresh RP. A new method of estimation for location and scale parameters. Journal of Statitical Planning and Inference. 1992;30:281‒292.
  14. Vaughan DC. On the Tiku‒Suresh method of estimation. Communications in Statistics‒Theory and Methods. 1992;21(2):451‒469.
  15. Vaughan DC, Tiku ML. Estimation and hypothesis testing for nonnormal bivariate distribution with applications. Mathematical and Computer Modelling. 2000;32:53‒67.
  16. Oral E, Gunay S. Stochastic Covariates in Binary Regression. Hacettepe Journal of Mathematics and Statistics. 2004;33:97‒109.
  17. Oral E. Binary Regression with Stochastic Covariates. Communications in Statistics Theory and Methods. 2006;35:1429‒1447.
  18. Oral E. Parameter Estimation in Generalized Linear Models through Modified Maximum Likelihood. International Statistical Institute. Proceedings of the 58th World Statistical Congress. 2011.
  19. Agresti A. Categorical Data Analysis. John Wiley and Sons, New York, USA. 1996.
  20. Kwan R. Lee, Kapadia C H , Dwight B. et al. On Estimating the Scale Parameter of the Rayleigh distribution from Doubly Censored Samples. Statistische Hefte. 1980;21(1):14‒29.
  21. Cantoni E, Ronchetti E. Robust Inference for Generalized Linear Models. Journal of the American Statistical Association. 2001;96(455):1022‒1030.
Creative Commons Attribution License

©2017 Oral. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.