Research Article Volume 6 Issue 1
Department of Biostatistics Program, LSUHSC School of Public Health, USA
Correspondence: Evrim Oral, Biostatistics Program, LSUHSC School of Public Health, 2020 Gravier Street, 3rd floor New Orleans, LA 70112, USA, Tel (504) 5686094
Received: April 25, 2017  Published: May 25, 2017
Citation: Oral E (2017) Modified Maximum Likelihood Estimation in Poisson Regression. Biom Biostat Int J 6(1): 00154. DOI: 10.15406/bbij.2017.06.00154
In Generalized Linear Models, likelihood equations are intractable and do not have explicit solutions; thus, they must be solved by using Newtontype algorithms. Solving these equations by iterations, however, can be problematic: the iterations might converge to wrong values or the iterations might not converge at all. In this study, we derive the modified maximum likelihood estimators for Poisson regression model and study their properties. We also search the robustness of these estimators when there are outliers in the covariates.
Keywords: Count data; Poisson regression; Modified maximum likelihood; Newtontype algorithms; Dixon’s outlier model
Poisson regression is widely used for modeling count data, especially when there is no over or under dispersion [1]. Since the likelihood equations from this model are intractable, solving these equations requires using iterative methods, such as Newton Raphson or Fisher scoring. However, using iterative methods to find maximum likelihood estimators (MLEs) can generally be problematic and time should be spent to investigate the stability of such solutions [24]. Specifically the following difficulties can arise: the iterations might converge to wrong values if the likelihood equations have multiple roots, or the iterations might not converge at all. See [5] and [6] for a discussion about situations where one encounters these difficulties in solving MLEs. The common software, Stata for example, is known to be very sensitive to numerical iterations. Researchers have reported problems in getting Poisson regression estimates with the “poisson” command, which encounters problems in locating the maximum and does not converge [7]. Note that in Poisson regression modeling, additional problems might occur. The most common problem is the over or under dispersion in data, in which case using a more flexible model such as negative binomial regression is more appropriate. The second problem is analogous to the complete separation or quasicomplete separation problem in binary regression: the MLEs may not exist for certain data configurations, see [7,8]. In this study we do not consider either of these problems and focus only on the case where it is appropriate to model the data with Poisson regression.
Unlike the maximum likelihood (ML) technique, modified maximum likelihood (MML) methodology produces explicit estimates. MML achieves explicit estimates by linearizing intractable functions within the likelihood equations using the ordered statistics [9]. Asymptotically, MML estimators (MMLEs) are known to be unbiased and have minimum variances, i.e. they are fully efficient. For small sample sizes MMLEs have negligible bias and their variances are only marginally bigger than the minimum variance bounds, i.e. they are highly efficient [1015].
In the GLM setting, Tiku and Vaughan [3] used the MML methodology to extend the techniques of traditional logistic regression to nonlogistic density functions. Oral and Gunay [16] and Oral [17] later extended the work in [3] to the binary regression model with one stochastic covariate. Oral [18] derived the MMLEs in general GLMs which use canonical link when there is only one risk factor. In this study, we derive the explicit MMLEs of Poisson regression model, generalize the derivations to more than one covariate, and study their robustness properties via simulations.
The Univariate Poisson regression model is given by
$E({Y}_{i}{X}_{i}={x}_{i})={\mu}_{i}$
$=\mathrm{exp}({z}_{i})$ (1)
where ${z}_{i}=\alpha +\beta {x}_{i}$ , for $1\le i\le n$ , and the outcome $Y$ has the probability distribution
${f}_{y}({y}_{i})=\frac{\mathrm{exp}({\mu}_{i})\text{\hspace{0.17em}}{\mu}^{{y}_{i}}}{{y}_{i}!},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{y}_{i}=0,\text{\hspace{0.17em}}1,\text{\hspace{0.17em}}2,\text{\hspace{0.17em}}\mathrm{...}$ (2)
Note that in equation (2), $Y$ is presumed to increase with X so that b is a priori greater than zero. For the model given in (1)(2), the loglikelihood function of the random sample $({y}_{i}\text{\hspace{0.17em}},\text{\hspace{0.17em}}{x}_{i})$ can be written as
$\mathrm{ln}L\propto {\displaystyle \sum _{i=1}^{n}{y}_{i}{z}_{i}}{\displaystyle \sum _{i=1}^{n}g({z}_{i})}$ , (3)
where $g({z}_{i})=\mathrm{exp}({z}_{i})$ . The likelihood equations for estimating a and b do not have explicit solutions because of the nonlinear function $g({z}_{i})$ , $1\le i\le n$ . To obtain the MMLEs, we first express the likelihood equations in terms of the ordered variates ${z}_{(1)}\le {z}_{(2)}\le \mathrm{...}\le {z}_{(n)}$ . The likelihood equations can be rewritten as
$\frac{\partial \mathrm{ln}L}{\partial \alpha}={\displaystyle \sum _{i=1}^{n}\left\{{y}_{[i]}g({z}_{(i)})\right\}}=0$ (4)
and
$\frac{\partial \mathrm{ln}L}{\partial \beta}={\displaystyle \sum _{i=1}^{n}{x}_{(i)}\left\{{y}_{[i]}g({z}_{(i)})\right\}}=0$ (5)
Where ${y}_{[i]}$ is the concomitant of ${x}_{(i)}$ $(1\le i\le n)$ and $g({z}_{(i)})=\mathrm{exp}({z}_{(i)})$ . Linearizing the intractable function $g({z}_{i})$ by using the first two terms of its Taylor series expansion around the population quantiles ${t}_{(i)}=E({z}_{(i)})$ ( $1\le i\le n$ ), we find
$g\text{\hspace{0.17em}}({z}_{(i)})\cong {a}_{i}+{b}_{i}\text{\hspace{0.17em}}{z}_{(i)}$ , (6)
where ${a}_{i}=\mathrm{exp}({t}_{(i)})\left\{\text{\hspace{0.17em}}1{t}_{(i)}\right\}$ and ${b}_{i}=\mathrm{exp}({t}_{(i)})$ for $1\le i\le n$ . In order to calculate ${t}_{(i)}=E({z}_{(i)})$ values, we define a dummy random variable U with the probability density function
$f(u)=\mathrm{exp}(u),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}u<0$ (7)
and rewrite the model (1) as
$E({Y}_{i}{X}_{i}={x}_{i})={\mu}_{i}=F\left({z}_{i}\right)$ , (8)
where $F(u)=\mathrm{exp}(u)$ , $u<0$ . Thus, the ${t}_{(i)}$ values can be obtained from the equation
${t}_{(i)}=\mathrm{ln}\left(i/\left(n+1\right)\right)$ , (9)
For $1\le i\le n$ ; asymptotically ${z}_{(i)}{t}_{(i)}\cong 0.$ Alternatively, when n is large one can utilize the standard normal distribution
${t}_{(i)}={\Phi}^{1}\left(i/\left(n+1\right)\right)$ . (10)
Incorporating (6) into (4)(5) and solving the resulting modified likelihood equations yield the explicit MMLEs below:
$\widehat{\alpha}=\frac{\delta}{m}\widehat{\beta}\text{\hspace{0.17em}}{\overline{x}}_{a}$ and $\widehat{\beta}=\frac{{\displaystyle \sum _{i=1}^{n}{\delta}_{i}({x}_{(i)}{\overline{x}}_{a}})}{{\displaystyle \sum _{i=1}^{n}{b}_{i}{({x}_{(i)}{\overline{x}}_{a})}^{2}}}$ , (10)
where
$\delta ={\displaystyle \sum _{i=1}^{n}{\delta}_{i}}$ , ${\delta}_{i}={y}_{[i]}{a}_{i}$ , $m={\displaystyle \sum _{i=1}^{n}{b}_{i}}$ , (11)
and
${\overline{x}}_{a}=\left({\displaystyle \sum _{i=1}^{n}b{}_{i}x{}_{(i)}}\right)/m$ . (12)
The MMLEs derived above are asymptotically equivalent to their corresponding MLEs, giving them the same attractive asymptotic properties; however, one can refine the estimates by recalculating ${a}_{i}$ and ${b}_{i}$ values by replacing the theoretical population quantiles with their estimated values ${t}_{(i)}=\widehat{\alpha}+\widehat{\beta}\text{\hspace{0.17em}}{x}_{(i)}$ $(\text{\hspace{0.17em}}1\le i\le n)$ . This process might be repeated until a desired convergence is met. The stabilization generally is reached within a few iterations.
Asymptotic variances and Covariances
Vaughan and Tiku [15] proved rigorously that the MMLEs are asymptotically unbiased and their variances and covariances are exactly the same as those of the MLEs. In the present situation, therefore, the asymptotic variances and the covariance of $\widehat{\alpha}$ and $\widehat{\beta}$ are given by ${I}^{1}\left(\alpha ,\beta \right)$ , where I is the Fisher Information matrix consisting of the elements $E\left({\partial}^{2}\mathrm{ln}{L}^{*}/\partial {\alpha}^{2}\right)\text{\hspace{0.17em}},$ $E\left({\partial}^{2}\mathrm{ln}{L}^{*}/\partial \alpha \text{\hspace{0.17em}}\partial \beta \right)$ , and $E\left({\partial}^{2}\mathrm{ln}{L}^{*}/\partial {\beta}^{2}\right)$ . From the modified likelihood equations
$\frac{\partial \mathrm{ln}L}{\partial \alpha}\propto \frac{\partial \mathrm{ln}{L}^{*}}{\partial \alpha}={\displaystyle \sum _{i=1}^{n}\left\{{y}_{i}({a}_{i}+{b}_{i}{z}_{i})\right\}}=0$ ,
$\frac{\partial \mathrm{ln}L}{\partial \beta}\propto \frac{\partial \mathrm{ln}{L}^{*}}{\partial \beta}={\displaystyle \sum _{i=1}^{n}{x}_{i}\left\{{y}_{i}({a}_{i}+{b}_{i}{z}_{i})\right\}}=0$ ,
the Fisher Information matrix can be easily obtained as
$V={I}^{1}({\gamma}_{0},{\gamma}_{1})={\left[\begin{array}{cc}{\displaystyle \sum _{i=1}^{n}{Q}_{i}}& {\displaystyle \sum _{i=1}^{n}{Q}_{i}{x}_{i}}\\ {\displaystyle \sum _{i=1}^{n}{Q}_{i}{x}_{i}}& {\displaystyle \sum _{i=1}^{n}{Q}_{i}{x}_{i}^{2}}\end{array}\right]}^{1}$ , (13)
where ${Q}_{i}=\mathrm{exp}({z}_{i})$ . V is estimated by replacing ${Q}_{i}$ with its estimate ${\widehat{Q}}_{i}=\mathrm{exp}({\widehat{z}}_{i})$ , ${\widehat{z}}_{i}=\widehat{\alpha}+\widehat{\beta}\text{\hspace{0.17em}}{x}_{i}$ ( $1\le i\le n$ ). Since ${z}_{i}$ values converge to ${t}_{i}$ values as n tends to infinity, ${a}_{i}$ and ${b}_{i}$ values are treated as constant coefficients for large n, see also [3]. Hence, the asymptotic variances can be estimated by
$\text{Var}(\widehat{\alpha})={\displaystyle \sum _{i=1}^{n}{\widehat{Q}}_{i}{x}_{i}^{2}}/\left\{{\displaystyle \sum _{i=1}^{n}{\widehat{Q}}_{i}}{\displaystyle \sum _{i=1}^{n}{\widehat{Q}}_{i}{x}_{i}^{2}{\left({\displaystyle \sum _{i=1}^{n}{\widehat{Q}}_{i}{x}_{i}}\right)}^{2}}\right\}$ , (14)
$\text{Var}(\widehat{\beta})={\displaystyle \sum _{i=1}^{n}{\widehat{Q}}_{i}}/\left\{{\displaystyle \sum _{i=1}^{n}{\widehat{Q}}_{i}}{\displaystyle \sum _{i=1}^{n}{\widehat{Q}}_{i}{x}_{i}^{2}{\left({\displaystyle \sum _{i=1}^{n}{\widehat{Q}}_{i}{x}_{i}}\right)}^{2}}\right\}$ . (15)
Hypothesis Testing
Testing the null hypothesis ${H}_{0}:\beta =0$ is of great practical importance in Poisson regression modelling. The likelihood ratio statistic for testing ${H}_{0}$ is $LR=2({L}_{0}{L}_{1})$ , where ${L}_{0}$ and ${L}_{1}$ denote the maximized logmodified likelihood functions under the null and alternative hypotheses, respectively. The null distribution of LR is asymptotically a chisquare with 1 degree of freedom. Large values of LR lead the rejection of ${H}_{0}$ . Alternatively, the Wald statistic W (the ratio of $\widehat{\beta}$ to its standard error) might be used. Since $\widehat{\beta}$ is asymptotically equivalent to the MLE, the null distribution of W is asymptotically normal N(0,1). Large values of W lead to the rejection of ${H}_{0}$ .
Numerical Example
To compare ML and MML estimates numerically, we analyzed the data given on page 82 of Agresti [19]. The data is from a study of nesting horseshoe crabs where the response Y is the number of satellites that each female crab has, and the corresponding values of the covariate X is the carapace width of 173 crabs. The study investigates the relationship between Y and X. We calculated the MMLEs from equations (10)(12) and their approximate standard errors from (14)(15). The FORTRAN code written to carry out the calculations can be obtained from the author. Our results are completely consistent with those given in [19], which is expected; see Table 1.
Coefficient 
Estimate 
SE 
W 
LR 

ML 
a 
3.3048 
0.5422 

b 
0.164 
0.02 
8.2 
64.9 

Coefficient 
Estimate 
SE 
W 
LR 

MML 
a 
3.3047 
0.5423 

b 
0.164 
0.0199 
8.241 
64.91 
Table 1: MLEs and MMLEs along with their standard errors for horseshoe crab data.
Remark: In solving (10)(12), Oral [18] proposed to calculate the initial ${t}_{(i)}$ values from the least squares estimators (LSEs), which is a different approach than using equation (9) (Approach 1) or equation (10) (Approach 2). Since ${t}_{(i)}=E(\alpha +\beta \text{\hspace{0.17em}}{x}_{(i)})$ , ${t}_{(i)}$ values can be approximated by ${\tilde{t}}_{(i)}=\tilde{\alpha}+\tilde{\beta}\text{\hspace{0.17em}}{x}_{(i)}$ , where
$\tilde{\alpha}=\overline{y}\tilde{\beta}\text{\hspace{0.17em}}\overline{x}$ and $\tilde{\beta}={\displaystyle \sum _{i=1}^{n}\left({x}_{i}\overline{x}\right){y}_{i}}/{\displaystyle \sum _{i=1}^{n}{\left({x}_{i}\overline{x}\right)}^{2}}$
are the LSEs; see also [3] and [20]. When using this approach (say, Approach 3), the ${t}_{(i)}$ values need to be revised after the first iteration with their estimated values ${t}_{(i)}=\widehat{\alpha}+\widehat{\beta}\text{\hspace{0.17em}}{x}_{(i)}$ $(\text{\hspace{0.17em}}1\le i\le n)$ as described above. Estimating population quantiles from the LSEs changes neither the derivations/solutions (10)(12) nor the results. However, the total revision number needed for stabilization under different approaches is not the same, see also [3] page 889. To compare the performance of these three approaches, we conducted a simulation study where we calculated the bias values and variances of the resulting MLEs as well as the coverage probabilities. We also provided average revision numbers needed for stabilization from each approach. We set $\alpha $ to zero and considered various values for $\beta $ and sample size n. Our results from 10,000 Monte Carlo runs are given in Table 2.
As can be seen from Table 2, all approaches provide same biases and variances after stabilization, which is expected. The resulting coverage probabilities from all approaches are close to 0.95, and as sample size increases, both bias values and variances decrease, also as expected. For a given $(\alpha ,\beta )$ value, it can be seen that the fastest stabilization is achieved by Approach 2 (i.e. equation (10)). Thus, although all three approaches yield the same results, we suggest to calculate initial ${t}_{(i)}$ values $(\text{\hspace{0.17em}}1\le i\le n)$ from equation (10) because the stabilization from this approach is the fastest one.


$\beta $ 
0.1 

0.5 

1.0 

n 


$\widehat{\alpha}$ 
$\widehat{\beta}$ 
$\widehat{\alpha}$ 
$\widehat{\beta}$ 
$\widehat{\beta}$ 

30 
Approach 1 
Bias 
0.0355 
0.0006 
0.0341 
0.0067 
0.0278 
0.0078 

Variance 
0.0373 
0.0389 
0.0403 
0.0369 
0.0451 
0.0328 


Coverage prob. 
0.9521 
0.9537 
0.9522 
0.9541 
0.9525 
0.9488 


No of revisions 
4.67 

3.45 

6.10 


Approach 2 
Bias 
0.0355 
0.0006 
0.0341 
0.0067 
0.0279 
0.0079 


Variance 
0.0373 
0.0388 
0.0403 
0.0369 
0.0451 
0.0328 


Coverage prob. 
0.9521 
0.9537 
0.9522 
0.9541 
0.9525 
0.9488 


No of revisions 
2.65 

2.00 

1.81 


Approach 3 
Bias 
0.0355 
0.0006 
0.0341 
0.0067 
0.0278 
0.0079 


Variance 
0.0372 
0.0389 
0.0402 
0.0368 
0.0451 
0.0328 


Coverage prob. 
0.9521 
0.9537 
0.9523 
0.9540 
0.9525 
0.9488 


No of revisions 
2.98 

3.04 

4.63 


100 
Approach 1 
Bias 
0.0098 
0.0007 
0.0079 
0.0002 
0.0081 
0.0017 

Variance 
0.0104 
0.0104 
0.0113 
0.0094 
0.0126 
0.0074 


Coverage prob. 
0.9504 
0.9518 
0.9470 
0.9494 
0.9538 
0.9495 


No of revisions 
4.78 

3.24 

6.59 


Approach 2 
Bias 
0.0098 
0.0007 
0.0079 
0.0001 
0.0081 
0.0016 


Variance 
0.0103 
0.0104 
0.0114 
0.0095 
0.0126 
0.0073 


Coverage prob. 
0.9504 
0.9518 
0.9470 
0.9494 
0.9538 
0.9496 


No of revisions 
2.91 

2.01 

1.32 


Approach 3 
Bias 
0.0098 
0.0007 
0.0078 
0.0002 
0.0082 
0.0017 


Variance 
0.0104 
0.0104 
0.0113 
0.0094 
0.0126 
0.0073 


Coverage prob. 
0.9505 
0.9518 
0.9470 
0.9494 
0.9539 
0.9495 


No of revisions 
2.99 

3.00 

4.69 


250 
Approach 1 
Bias 
0.0041 
0.0002 
0.0041 
0.0000 
0.0013 
0.0007 

Variance 
0.0041 
0.0040 
0.0044 
0.0036 
0.0049 
0.0026 


Coverage prob. 
0.9506 
0.9452 
0.9500 
0.9479 
0.9507 
0.9513 


No of revisions 
4.78 

3.11 

6.87 


Approach 2 
Bias 
0.0041 
0.0002 
0.0040 
0.0000 
0.0012 
0.0007 


Variance 
0.0040 
0.0041 
0.0044 
0.0036 
0.0049 
0.0026 


Coverage prob. 
0.9506 
0.9452 
0.9500 
0.9479 
0.9507 
0.9513 


No of revisions 
2.90 

2.00 

1.12 


Approach 3 
Bias 
0.0041 
0.0002 
0.0041 
0.0001 
0.0013 
0.0007 


Variance 
0.0041 
0.0041 
0.0044 
0.0036 
0.0049 
0.0026 


Coverage prob. 
0.9506 
0.9452 
0.9500 
0.9479 
0.9507 
0.9513 


No of revisions 
2.99 

3.00 

4.76 

Table 2: Bias values, variances, convergence probabilities and average revision numbers using three different approaches to calculate ${t}_{(i)}$ values $(\text{\hspace{0.17em}}1\le i\le n)$ .
Generalization to Multivariable Case
Now consider k $(k\ge 2)$ covariates and assume all of them take positive values without loss of generality. The Poisson regression model with k covariates can be written as
$E({Y}_{i}{x}_{i1},{x}_{i2},.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.,{x}_{ik})=F\left({z}_{i}\right)$ (16)
where $F\left({z}_{i}\right)=\mathrm{exp}({z}_{i})$ and
${z}_{i}={\beta}_{0}+{\displaystyle \sum _{j=1}^{k}{\beta}_{j}{x}_{ij}}$ , (17)
for $1\le i\le n.$ , $1\le j\le k$ . Following the same lines of [3], in order to rank the zvalues we can assume that all covariates are equally effective in increasing the response Y, i.e. we initially take ${\beta}_{j}$ ’s all equal, and order the zvalues that would correspond to the ordered xvalues, where ${x}_{i}={x}_{i1}+{x}_{i2}+\mathrm{...}+{x}_{ik}$ $(1\le i\le n)$ . In other words, the ordered zvalues become
${z}_{(i)}={\beta}_{0}+{\beta}_{1}{x}_{i1}^{*}+.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.+{\beta}_{k}{x}_{ik}^{*}$ , (18)
where the vector ${\left[\begin{array}{cccccc}1& {x}_{i1}^{*}& {x}_{i2}^{*}& .& .& {x}_{ik}^{*}\end{array}\right]}^{}$ is the ith row of the matrix
${X}^{*}=\left[\begin{array}{ccccc}1& {x}_{11}^{*}& {x}_{12}^{*}& \mathrm{...}& {x}_{1k}^{*}\\ 1& {x}_{21}^{*}& {x}_{22}^{*}& \mathrm{...}& {x}_{2k}^{*}\\ .& .& .& \mathrm{...}& .\\ .& .& .& \mathrm{...}& .\\ 1& {x}_{n1}^{*}& {x}_{n2}^{*}& \mathrm{...}& {x}_{nk}^{*}\end{array}\right]$ , (19)
which is constructed by arranging the rows of the X matrix
$X=\left[\begin{array}{ccccc}1& {x}_{11}& {x}_{12}& \mathrm{...}& {x}_{1k}\\ 1& {x}_{21}& {x}_{22}& \mathrm{...}& {x}_{2k}\\ .& .& .& \mathrm{...}& .\\ .& .& .& \mathrm{...}& .\\ 1& {x}_{n1}& {x}_{n2}& \mathrm{...}& {x}_{nk}\end{array}\right]$ ,
so as to correspond to the ordered ${x}_{(i)}$ value $(1\le i\le n)$ . The MMLEs can be obtained along the same lines as in the Univariate case:
$\widehat{\Gamma}={\left({X}^{*}{}^{T}M\text{\hspace{0.17em}}\text{\hspace{0.17em}}{X}^{*}\right)}^{1}{X}^{*}{}^{T}\Delta $ (20)
where $\Delta ={\left[\begin{array}{ccccc}{\delta}_{1}& {\delta}_{2}& .& .& {\delta}_{n}\end{array}\right]}^{T}$ , ${\delta}_{i}$ is given by (11) and M is the nxn diagonal matrix
$M=\left[\begin{array}{cccc}{b}_{1}& 0& \mathrm{...}& 0\\ 0& {b}_{2}& \mathrm{...}& 0\\ \mathrm{...}& \mathrm{..}& \mathrm{...}& \mathrm{..}\\ 0& 0& \mathrm{...}& {b}_{n}\end{array}\right]$ .
The asymptotic variancecovariance matrix V of the estimators can be derived from the Fisher information matrix $V={I}^{1}\left({\beta}_{0},{\beta}_{1},\mathrm{...},{\beta}_{k}\right)$ as given below
$V={\left[\begin{array}{llll}{\displaystyle \sum {Q}_{i}}\hfill & {\displaystyle \sum {Q}_{i}}{x}_{1i}\hfill & \mathrm{...}\hfill & {\displaystyle \sum {Q}_{i}}{x}_{ki}\hfill \\ {\displaystyle \sum {Q}_{i}}{x}_{1i}\hfill & {\displaystyle \sum {Q}_{i}}{x}^{2}{}_{1i}\hfill & \mathrm{...}\hfill & {\displaystyle \sum {Q}_{i}}{x}_{ki}{x}_{1i}\hfill \\ \mathrm{...}\hfill & \mathrm{...}\hfill & \mathrm{...}\hfill & \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{...}\hfill \\ {\displaystyle \sum {Q}_{i}}{x}_{ki}\hfill & {\displaystyle \sum {Q}_{i}}{x}_{ki}{x}_{1i}\hfill & \mathrm{...}\hfill & {\displaystyle \sum {Q}_{i}}{x}^{2}{}_{ki}\hfill \end{array}\right]}^{1}$ (21)
where ${Q}_{i}=\mathrm{exp}({z}_{i})$ . V is estimated by replacing ${Q}_{i}$ by
${\widehat{Q}}_{i}=\mathrm{exp}({\widehat{z}}_{i})$ , ${\widehat{z}}_{(i)}={\widehat{\beta}}_{0}+{\widehat{\beta}}_{1}{x}_{i1}^{*}+.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.+{\widehat{\beta}}_{k}{x}_{ik}^{*}$ $(1\le i\le n)$ .
Robustness
Measures of influence considered in linear regression models, such as high leverage values, are analogous in the GLM framework. Large leverage values typically mean that there are outliers in covariates. When outliers present in the data, inferences based on MLEs becomes unreliable. In fact, it has been showed that MLEs are not robust in GLMs [21]. In Poisson regression setting, if there are outliers in the continuous covariates, the estimates can be influenced. Thus, we also studied the robustness properties of the derived MMLEs under several outlier models. We considered the Univariate model given by (1) for simplicity. We assumed that $\alpha =0$ and performed a MonteCarlo study for different values of $\beta $ , where (nr) of the observations ${X}_{1},\text{\hspace{0.17em}}{X}_{2},\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.,{X}_{n}$ (we don’t know which) come from the Standard Normal Distribution with $\sigma =1$ and the remaining r observations come from the Normal distribution with a scale $c\sigma $ where c is a positive constant. We calculated the value of r from the equation $r=\left[0.1\text{\hspace{0.17em}}n+0.5\right]$ (Dixon’s outlier model). The outlier models considered are:
(a) (nr) come from N(0,1) and r come from N(0,1) (No outlier situation),
(b) (nr) come from N(0,1) and r come from N(0,1.5),
(c) (nr) come from N(0,1) and r come from N(0,2),
(d) (nr) come from N(0,1) and r come from N(0,4).
Note that the model (a) above does not involve outliers and is given for the sake of comparisons. In order to be able to make direct comparisons, after generating the X values we divided them by the standard deviation of the distribution for each model. After generating the X values, we calculated ${z}_{i}=\alpha +\beta {x}_{i}\text{\hspace{0.17em}}$ and ${\mu}_{i}=\mathrm{exp}({z}_{i})$ for $1\le i\le n$ to generate ${Y}_{i}$ values from Poisson distribution with mean ${\mu}_{i}$ ( $1\le i\le n$ ). The values obtained from 5000 runs are given in (Table 3).


Model (a): No outlier 
Model (b) 

$\beta $ 
n 
Bias( $\widehat{\alpha}$ ) 
Bias( $\widehat{\alpha}$ ) 
Var( $\widehat{\alpha}$ ) 
Var( $\widehat{\alpha}$ ) 
Bias( $\widehat{\alpha}$ ) 
Bias( $\widehat{\beta}$ ) 
Var( $\widehat{\alpha}$ ) 
Var( $\widehat{\beta}$ ) 
0.1 
30 
0.0326 
0.0075 
0.0371 
0.0385 
0.0330 
0.0021 
0.0377 
0.0388 

50 
0.0195 
0.0004 
0.0214 
0.0219 
0.0211 
0.0033 
0.0214 
0.0218 

100 
0.0082 
0.0039 
0.0103 
0.0102 
0.0094 
0.0008 
0.0103 
0.0104 
0.2 
30 
0.0316 
0.0020 
0.0377 
0.0388 
0.0302 
0.0034 
0.0388 
0.0386 

50 
0.0202 
0.0029 
0.0217 
0.0215 
0.0200 
0.0007 
0.0219 
0.0212 

100 
0.0108 
0.0000 
0.0104 
0.0102 
0.0115 
0.0002 
0.0108 
0.0103 
0.4 
30 
0.0323 
0.0033 
0.0391 
0.0376 
0.0346 
0.0012 
0.0407 
0.0385 

50 
0.0207 
0.0005 
0.0224 
0.0209 
0.0165 
0.0004 
0.0230 
0.0209 

100 
0.0086 
0.0004 
0.0109 
0.0097 
0.0089 
0.0009 
0.0113 
0.0099 


Model (c) 
Model (d) 

$\beta $ 
n 
Bias( $\widehat{\alpha}$ ) 
Bias( $\widehat{\alpha}$ ) 
Var( $\widehat{\alpha}$ ) 
Var( $\widehat{\alpha}$ ) 
Bias( $\widehat{\alpha}$ ) 
Bias( $\widehat{\beta}$ ) 
Var( $\widehat{\alpha}$ ) 
Var( $\widehat{\beta}$ ) 
0.1 
30 
0.02827 
0.0003 
0.0381 
0.0413 
0.0385 
0.0026 
0.0372 
0.0477 

50 
0.0192 
0.0013 
0.0220 
0.0220 
0.0211 
0.0004 
0.0213 
0.0246 

100 
0.0095 
0.0001 
0.0105 
0.0104 
0.0099 
0.0002 
0.0108 
0.0107 
0.2 
30 
0.0362 
0.0036 
0.0396 
0.0419 
0.0350 
0.0080 
0.0375 
0.0467 

50 
0.0163 
0.0014 
0.0208 
0.0211 
0.0189 
0.0057 
0.0221 
0.0242 

100 
0.0111 
0.0010 
0.0106 
0.0105 
0.0109 
0.0013 
0.0099 
0.0103 
0.4 
30 
0.0313 
0.0015 
0.0396 
0.0370 
0.0314 
0.0017 
0.0379 
0.0461 

50 
0.0178 
0.0030 
0.0226 
0.0207 
0.0166 
0.0001 
0.0218 
0.0223 

100 
0.0079 
0.0003 
0.0110 
0.0093 
0.0074 
0.0039 
0.0111 
0.0088 
Table 3: Empirical biases and variances from Dixon’s outlier models.
As can be seen from the table, the biases in the estimates are negligible for all models. The variances $\text{Var}(\widehat{\beta})$ (hence the Wald statistics W) are almost the same for a given n for the models (a), (b), (c) and (d), which means that the MMLEs are robust to outliers in the covariate. Note that the MML methodology achieves robustness through the ${t}_{(i)}$ ( $1\le i\le n$ ) values.
Poisson regression serves as a useful technique to model count data. The MLEs in Poisson regression are obtained via Newtontype algorithms; however these algorithms might not converge or converge to inaccurate values. In this study we derived the explicit MMLEs for Poisson regression. We also considered the case where there are outliers in the (continuous) covariate, which generally is the case in real life applications, and searched the properties of the derived MMLEs under several data violations. Although the scope of the simulations reported here is limited, we can conclude that MML methodology provides robust estimation in Poisson regression.
We are grateful to the reviewer for the helpful comment which helped to improve this manuscript.
©2017 Oral. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work noncommercially.