Loading [MathJax]/jax/output/CommonHTML/jax.js
Submit manuscript...
eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Review Article Volume 5 Issue 1

Predictive influence of variables on the odds ratio and in the logistic model

S K Bhattacharjee,1 Atanu Biswas,2 Ganesh Dutta,3 S Rao Jammalamadaka,4 M Masoom Ali5

1Indian Statistical Institute, North-East Centre, Tezpur, Assam-0, India
2Indian Statistical Institute, India
3Basanti Devi College, India
4Department of Statistics and Applied Probability, University of California, USA
5Department of Mathematical Sciences, Ball State University, USA

Correspondence: S Rao Jammalamadaka, Department of Statistics and Applied Probability, University of California, USA

Received: October 01, 2016 | Published: February 1, 2017

Citation: Bhattacharjee SK, Biswas A, Dutta G, et al. Predictive influence of variables on the odds ratio and in the logistic model. Biom Biostat Int J. 2017;5(1):25-37. DOI: 10.15406/bbij.2017.05.00125

Download PDF

Abstract

We study the influence of explanatory variables in prediction by looking at the distribution of the log-odds ratio. We also consider the predictive influence of a subset of unobserved future variables on the distribution of log-odds ratio as well as in a logistic model, via the Bayesian predictive density of a future observation. This problem is considered for dichotomous, as well as continuous explanatory variables.

AMS subject classification: Primary 62J12, Secondary 62B10, 62F15

Keywords: predictive density/probability, log-odds ratio, logistic model, predictive influence, missing/unobserved variable, kullback-leibler divergence

Introduction

Odds ratio (OR) is perhaps the most popular measure of treatment difference for binary outcomes and is extensively used in dealing with 2×2 tables in biomedical studies and clinical trials. The distribution of the log of sample OR is often approximated by a normal distribution with true log OR as the mean and with variance estimated by the sum of the reciprocal of the four cell frequencies in the 2×2 table Breslow.1 Böhning et al.2 provide detailed book-length discussion on the OR. For logistic regression, ORs enable one to examine the effect of explanatory variables in that relationship.

Logistic link is perhaps the most popular way to model the success probabilities of a binary variable. Pregibon,3 Cook and Weisberg4 and Johnson5 have considered the problem of the influence of observations for logistic regression models. Several measures have been suggested to identify observations in the data set which are influential relative to the estimation of the vector of regression coefficients, the deviance, the determination of predictive probabilities and the classification of future observations.

Bhattacharjee & Dunsmore6 considered the effect on the predictive probability of a future observation of the omission of subsets of the explanatory variables. Mercier et al.7 used logistic regression to determine whether age and/or gender were a factor influencing severity of injuries suffered in head-on automobile collisions on rural highways. Zellner et al.8 considered the problem of variable selection in logistic regression to compare the performance of stepwise selection procedures with a bagging method.

In the present paper, our aim is to measure the predictive influence of a subset of explanatory variables in log-odds ratio of a logistic model using a Bayesian approach. We are also interested in studying the effect of missing future explanatory variables on Bayes prediction, on a logistic model as well as on the log-odds ratio.

In Section 2, we derive the predictive densities of a future log-odds ratio for both the full model and a subset deleted model. We derive the predictive density of log-odds ratio in Section 3, when a subset of future explanatory variables is missing. To derive the predictive densities we assume that the future explanatory variables xf are distributed as multivariate normal, both when these xf's are independent or dependent. In Section 4, we discuss the influence of future missing explanatory variables by considering the predictive probability of a future response in a logistic model. This is done by assuming that the future explanatory variables xf are multivariate normal for the continuous case. Also considered is the dichotomous case. Since the predictive probabilities are not mathematically tractable for the logistic model, we use several approximations.

In Section 2 and 3 we employ Kullback-Leibler9 directed measure of divergence DKL to assess the influence of variables and also the influence of future missing variables on the log-odds ratio. The form of the Kullback-Leibler9 measure used here is given by

DKL=f(a'Wf|.)log(f(a'Wf|.)f(r+s)(a'Wf|.))d(a'Wf).

To assess the influence of missing future variables or to measure the predictive probability in a logistic model we use the absolute difference of the two predictive probabilities.

Influence of variables in log-odds ratio

Consider a phase III clinical trial with two competing treatments, say A and B, having binary responses. Suppose n patients are randomly allocated with nA and nB  patients to treatments A and B respectively. The patient responses are influenced by a covariate vector xp×1 where one component of x may be 1 (which covers the constant term). Let ( Yi ; Zi ; xi ) be the data corresponding to its patient, where Yi is the indicator of response ( Yi =1 or 0 for a success or failure), zi  is the indicator of the treatment assignment ( zi=1 )

or 0 according as treatment A or B is applied to the its patient), and x is the covariate vector. We assume a logit model for the responses:

Pr(Yi=1|Zi,xi)=exp(ΔZi+xiβ)1+exp(ΔZi+xiβ) i=1,2,....,n.   (i)

Then the odds for treatments A and B with covariate vector xi are respectively

OA=Pr(Yi=1|Zi=1,xi)Pr(Yi=0|Zi=1,xi)=exp(Δ+xiβ) , OB=Pr(Yi=1|Zi=0,xi)Pr(Yi=0|Zi=0,xi)=exp(xiβ)

and hence the log-odds ratio is

logOR=logOAlogOB=Δ

Let us partition

xβ=xAβA+xBβB+xABβAB

Where xA  indicates the variables used in treatment A only, xB  is for treatment B only, and xAB is for both treatments A and B. Then the model can be partitioned for treatments A and B as:

logOA=u=Δ+xAβA+xABβAB=x(A)β(A) (ii)

logOB=v=xAβB+xABβAB=x(B)β(B) (iii)

The predictive density of future log-odds for A, uf  , for non-informative prior (vague prior) with normal or any spherical symmetric errors is of Student form Jammalamadaka et al.10 and is given by

f(uf|xf(A),data)St(nk,xf(A)ˆβ(A),s2(A)(1+xf'(A)(x'(A)xA)1xf(A)))

where ˆβ(A) is the MLE of β(A) , s2(A)  is the MLE of σ2A and k is the number of parameters in the model (ii). See Bhattacharjee et al.11 in this context. If the sample size is large then this predictive density can be well approximated by its asymptotic normal form

N(xf(A)ˆβ(A),  s2(A)(1+xf'(A)(x'(A)x(A))1xf(A))(nk)/(nk2)).

Similarly one can find the same for treatment B, vf .

Let us define wf=(uf,vf)' and a=(1,1)' . Then the predictive density of future log odds ratio a'wf is given by

f(a'wf|xf(A),xf(B),data)N(θ,δ2)   (iv)

Where

θ=xf(A)ˆβ(A)xf(B)ˆβ(B)

and

δ2=s2(A)(1+xf'(A)(x'(A)x(A))1xf(A))(nk)/(nk2)+s2(B)((1+xf'(B)(x'(B)x(B))1xf(B))(nq)/(nq2))

Our interest is to measure the influence of explanatory variables in the predictive density (iv) for the following cases:

Case 1: Influence of r explanatory variables xrA  of xA  in treatment A.

Case 2: Influence of r  explanatory variables xrB  of xB  in treatment B.

Case 3: Influence of s  explanatory variables xsAB  of xAB in treatment A.

Case 4: Influence of S explanatory variables xsAB of xAB in treatment B.

Case 5: Joint influence of r  explanatory variables xrA of xA  and s explanatory variables xsAB of xAB in treatment A.

Case 6: Joint influence of r explanatory variables xrB  of xB  and s explanatory variables xsAB of xAB in treatment B.

To see the influence of explanatory variables in log-odds ratio, we construct a reduced log-odds model deleting a subset of explanatory variables. Then we derive the predictive density of future log-odds ratio for reduced model and compare it with the predictive density (iv) for full model. It is enough to consider Case 5 for illustration. We construct the reduced model by deleting variables xrA of xA  and xsAB  of xAB in (ii) as

u=Δ+x*Aβ*A+x*ABβ*AB=x*(A)β*(A)

Then the predictive density of uf is given by

f(uf|x*f(A),data)=St(nk+r+s,x*f(A)ˆβ*(A),  S*2(A)(1+x*f'(A)(x*'(A)x*(A))1 x*f(A)))

The normal approximation of the predictive density is

N(x*f(A)ˆβ*(A),s*2(A)(1+x*f'(A)(x*'(A)x*(A))1x*f(A))(nk+r+s)/(nk+r+s2))

Since no variable is missing in υ=logOB , the predictive density of υf is unaltered along with its normal approximation. Hence the predictive density of log-odds ratio a'wf  under Case 5 is given by

f(r+s)(a'wf|x*f(A),xf(B),data)N(θ*,δ*2)   (v)

Where

θ*=x*f(A)ˆβ*(A)xf(B)ˆβ(B)

and

δ*2=s*2(A)(1+x*f'(A)(x*'(A)x*(A))1x*f(A))(nk+r+s)/(nk+r+s2)+s2(B)(1+xf'(B)(x'(B)x(B))1xf(B))(nq)/(nq2)

To access the influence of the deleted variables we employ the Kullback-Leibler9 directed measure of divergence DKL between the predictive densities of a'wf for full model (iv) and reduced model (v). The form of K-L measure used here is given by

DKL=f(r+s)(a'ωf|.)log(f(r+s)(a'wf|.)f(a'wf|.))da'ωf

The discrepancy measure DKL between the predictive densities (iv) and (v) reduces to

DKL=(θθ*)22δ2+12(δ*2δ2log(δ*2δ2)1)

Here L=(θθ*)22δ2 is due to difference of location parameters and S=12(δ*2δ2log(δ*2δ2)1) due to difference of scale parameters of the two predictive densities (iv) and (v).

Example 1: Here we have considered a flu shot Data Pregibon.3 A local health clinic sent fliers to its clients to encourage everyone, but especially older persons at high risk of complications, to get a flu shot for protection against an expected flu epidemic. In a pilot follow-up study, 159 clients were randomly selected and asked whether they actually received a flu shot. A client who received a flu shot was coded Y=1; and a client who did not receive a flu shot was coded Y=0. In addition, data were collected on their age (x1) and their health awareness (x2) . Also included in the data were client gender (x3) , with males coded x3=1 and females coded x3=0 . Here we have divided whole data set into two groups A and B on the basis of gender that is group A corresponds to the male and group B corresponds to the female. We have computed DKL to measure the influence of the deleted variable x1  in group A and B separately and the discrepancies are drawn in Figure 1.

  1. Similar figure can be obtained by deleting x2 . From this figure the discrepancy is less around the mean of the deleted variable.

Example 2: This is a simulation exercise. Here we have drawn sample of size 159 from bivariate normal distribution and we have used means, variances and correlation coefficient of x1 and x2 of the above flu shot data of size 159 for generating the sample. Now using these x1 and x2 , we got response that is Y values and thereafter using this whole generated data set we have computed DKL . Now we have repeated whole process 1000 times and computed means of DKLs . The mean discrepancies are shown in Figure 2. Here we get the same conclusion as in the data example.

Influence of missing future explanatory variables in log-odds ratio

Here the aim is to detect the predictive influence of a set of missing future explanatory variables in log-odds ratio of logistic model (i). Our interest is to detect the influence of missing future explanatory variables in the six cases pointed out in Section 2. Let in treatment A, r future variables missing from xfA and s future variables missing from xfAB be denoted by x(r+s)f(A) . Similarly in treatment B, r future missing variables from xfB and s future variables missing from xfAB be denoted by x(r+s)f(B) . We assume that the errors of models (ii) and (iii) are normally distributed with zero means and variances τ1(A) and τ1(B) , respectively. We also assume that the conditional density of x*f(r) given x*f is independent of β(A) and τ(A) and x(r+|s)f(B)  given x*f(B) is independent of β(B) and τ(B) , i.e.,

f(x(r+s)f(.)|x*f(.),β(.),τ(.))=f(x(r+s)f(.)|x*f(.))

where x*f(.) denotes the future explanatory variables xf(.) without x(r+s)f(.) .

Explanatory variables are continuous

We assume that xf,i s are dependent and the distribution of xf(A) is (k1) -dimensional multivariate normal, i.e. f(xf(A))Nk1(η,  ψ) .

The conditional density of x(r+s)f(A)  given x*f(A) is given by

f(x(r+s)f(A)|x*f(A))Nr+s(η*(r+s),ψ*(r+s)) ,

Where

η=(η*,  ηr+s),xf(A)=(x*f(A),  x(r+s)f(A)),  ψ=(ψ11    ψ12ψ21    ψ22),  η*r+s=ηr+s+ψ21ψ111(x*f(A)η*)  

and  ψ*(r+s)=ψ22ψ21ψ111ψ12 .

As earlier it is enough to consider Case 5 to see the joint influence of r missing future explanatory variables xrfA of xfA and s missing future explanatory variables xsfAB of xfAB  in treatment A. The density of uf when x(r+s)f(A) is missing is given by

f(uf|x*f(A),β|(A),τ(A))=f(uf|xf(A),β(A),τ(A))f(x(r+s)f(A)|x*f(A))dx(r+s)f(A)N(krs1i=0xf(A)iˆβ(A)i+k1i=krsη*iˆβ(A)i,k1i=krsˆβ(A)iˆβ(A)jψ*ij+τ1(A))

Where η*i is the i th component of η*(r+s) and ψ*ij  is the (i.j)th component of ψ*(r+s) .

See Bhattacharjee et al11 in this context. Using Taylor's expansion and improper prior density for both β(A)  and τ(A) , the approximate predictive density of uf when x(r+s)f(A) is missing is given by

f(r+s)(uf|x*f(A),  data)N(krs1i=0xf(A)iˆβ(A)i+k1i=krsη*iˆβ(A)i,k1i,j=krsˆβ(A)iˆβ(A)jψ*ij+s2(A)γ*),

evaluated at  ˆβ(A) and s2(A) where

γ*=(1+12k10Q*ij(β(A),  τ(A))Cov(β(A)i,β(A)j)+12Q2τ(A)(β(A),τ(A))Var(τ(A)))

is the multiplicative factor for the second order Taylor's approximation. If xf(A)'s ’s are independent the corresponding approximate predictive density of uf is

f(r+s)(uf|x*f(A),  data)N(krs1i=0xf(A)iˆβ(A)i+k1i=krsηiˆβ(A)i,k1i,j=krsˆβ2(A)iψ2i+s2(A)γ)

evaluated at ˆβ(A) and s2(A) , where ηi and ψ2i  are mean and variance of the ith missing variable and γ=(1+12k10Qij(β(A),τ(A))Cov(β(A)i,β(A)j)+12Q2τ(A)(β(A),  τ(A))Var(τ(A))) . Since no future variable is missing in υ , the approximate predictive density of υf is same as obtained in Section 2. Thus when xf(A) ’s are dependent the approximate predictive density of log-odds ratio a'wf for x(r+s)f(A) missing is given by

f(r+s)(a'wf|x*f(A),xf(B);  data)γ*N(ξ,ω2)   (vi)

Where

ξ=krs1i=0xf(A)iˆβ(A)i+k1i=krsη*iˆβ(A)ixf(B)ˆβ(B)

and

ω2=(k1i,j=krsˆβ(A)iˆβ(A)jψ*ij+s2(A))+s2(B)(1+xf(B)(X'(B)X(B))1xf'(B))nqnq+2

The Kullback-Leibler9 directed measure of divergence between the predictive densities (iv) when no variable is missing and the predictive density (vi) when DKL=f(a'wf|xf(A),xf(B),  data)log(f(a'wf|xf(A),xf(B),  data)f(r+s)(a'wf|x*f(A),xf(B),  data))da'wf=12ω2(θξ)2+12(δ2ω2log(δ2ω2)1)
12k1i,j=0E(Q*ij(β(A), τ(A))Cov(τ(A)i, τ(A)j))12E(Q2τ(A)(β(A),τ(A))var(τ(A))) (vii)

If xf(A) ’s are independent the predictive density of a'wf  when (r+s) future variables are missing is same as (vi) and the corresponding Kullback-Leibler9 measure DKL is same as (vii) but replacing η*i  by ηi in ξ , ˆβ(A)iˆβ(A)jψ*ij  by ˆβ2(A)iψ2i in ω2  and Q*ij(β(A),  τ(A))  by  Qij(β(A),  τ(A))  in γ* , where ηi and ψ2i are mean and variance of the ith missing variable.

Explanatory variables are dichotomous

Here we assume that all the explanatory variables are dichotomous and independent. We assume that the errors of models (ii) and (iii) are normally distributed with means zero and variances τ1(A) and τ1(B) respectively. To assess the influence of the missing variables in treatment A, we consider that xf(A)i is distributed as

Pr(Xf(A)i=xf(A)i)=θxf(A)i(A)i(1θ(A)i)1xf(A)i,xf(A)i=0,1,    i=1,2,...,k1

The density of a future uf is

f(uf|xf(A),β(A),  τ(A))N(k1i=0xf(A)iβ(A)i,  τ1(A)).

If x(r)f(A) future variables are missing in treatment A, then the density of a future uf is given by

f(uf|x*f(A),β(A),  τ1(A))=1xf(A)kr=0.....1Σxf(A)k1=0N(k1i=0xf(A)iβ(A)i,  τ1(A))k1i=krθxf(A)i(A)i(1θ(A)i)1xf(A)i.

The predictive density of uf when x(r)f(A) is missing is given by

f(uf|x*f(A),  data=f(uf|x*f(A))β(A),  τ1(A))f(β(A)|data)dβ(A)    (viii)

which is not mathematically tractable. For vague prior densities for β(A) and τ(A)  and using Taylor's expansion, the approximate predictive density of (viii) is

f(uf|x*f(A),  data)=1xf(A)kr=0...1xf(A)k1=0N(k1i=0xf(A)iˆβ(A)i,s2(A))k1i=krθxf(A)i(A)i(1θ(A)i)1xf(A)i(1+k1i,j=0Qij(ˆβ,s2(A))cov(β(A)i,β(A)j)2+Qτ2(A)(ˆβ(A),s2(A))var(τ(A))2)

Since there are no missing variables in νf , the density of νf is same as that can be obtained in Section 2. Then the predictive density of a'wf is given by

f(a'wf|x*f(A),xf(B),  data)=1xf(A)kr=0...1xf(A)k1=0N(k1i=0(xf(A)iˆβ(A)ixf(B)iˆβ(B)i),S2(A)+s2(B)(1+xf(B)(X'(B)X(B))1x'(B)))k1i=krθxf(A)i(A)i(1θ(A)i)1xf(A)i(1+k1i,j=0Qij(ˆβ(A),s2(A))cov(β(A)i,β(A)j)2+QT2(A)(ˆβ(A),s2(A))var(τ(A))2)  (ix)

Analytical solution of DKL between the predictive densities (iv) and (ix) is very difficult to obtain but numerical solution can be obtained. In Some situations it is seen that among the explanatory variables, some of the variables are dichotomous and some of the variables are continuous. Among the k1 -explanatory variables, without loss of generality we assume that the first l are dichotomous and the remaining last kl1 are continuous variables. We also assume that out of l dichotomous future variables last d variables are missing and out of (kl1) continuous future variables last g variables are missing. Then the predictive density of future log-odds ratio a'wf when d dichotomous and g continuous variables are missing is given by

f(a'wf|x*f(A),x*f(B),  data)=(1xf(A)ld+1=0...1xf(A)l=0N(kg1i=0xf(A)iˆβ(A)i+k1i=kgηiˆβ(A)ik1Σi=0xf(B)iˆβ(B)i,  k1Σi=kgˆβ2(A)iΨ2i+S2(A)+S2(B)(1+xf(B)(X'(B)X(B))1x'(B))).       (x)lΠi=ld+1θxf(A)ii(1θi)1xf(A)i)(1+k1i,j=0Qij(ˆβ(A),s2(A))cov(β(A)i,β(A)j)2+QT2(A)(ˆβ(A),s2(A))var(T(A))2)  (x)

Again, analytical solution of DKL between the predictive densities (iv) and (x) is very difficult but we can obtain its numerical solution. In similar way we can derive the predictive density of future log-odds ratio when some future variables are missing in treatment B.

Example 1 revisited: This example is based on the flu shot data of Example 1. From Figure 3 we have observed same as Examples 1 and 2 that the discrepancies are less around the mean of the missing variables. Moreover we have observed from Figures 1 and 3 that the discrepancies of the missing variables are less as compared to the discrepancies of the deleted variables.

Example 2 revisited: This example is based on the simulation data of Example 2 and here we have also got same conclusion as Example 1 revisited (Figures 2 & 4).

Group A Group B

Figure 1 Three dimensional scatter plots based on real data for DKL

when x1 is deleted.

Group A Group B

Figure 2 Three dimensional scatter plots based on simulated data for DKL

when x1 is deleted.

Group A Group B

Figure 3 Three dimensional scatter plots based on real data for DKL

when xf1 is missing.

Group A Group B

Figure 4 Three dimensional scatter plots based on simulated data for DKL

when xf1 is missing.

Examples 1 and 2 revisited: In this example, we have used DKL values for real data for drawing box plots for each cases (deleted and missing). From Figure 5, we have observed that x2 is more in uential than x1. Moreover the discrepancies are much less in missing case than deleted case. We have got same result in simulation study and are illustrated in Figure 6.

Treatment A Treatment B

Figure 5 Box plot for DKL based on real data.

Treatment A Treatment B

Figure 6 Box plot for DKL based on simulated data.

Evaluation of predictive probability of a logistic model

We consider the logistic model as

Pr(y=1|x,β)=exp(xβ)/(1+exp(xβ))

The probability that a future response yf will be a success is given by

Pr(yf=1|xf,β)=exp(xfβ)/(1+exp(xfβ))

We assume that the conditional density of xf(r) given xf is independent of, β where xf denotes the future explanatory variables without variables xf(r). Then predictive probabilities of yf will be a success for models are given by

Pr(yf=1|xf,  data)=Pr(yf=1|xf,β)f(β|  data)dβ

and

Pr(yf=1|x*f,  data)=Pr(yf=1|x*f,β)f(β|  data)dβ respectively. Simple analytically tractable priors are not available here. Numerical integration techniques might be used for some specified priors to approximate Pr(yf=1|xf,data)  and Pr(yf=1|x*f,data) , respectively.

Normal approximation for the posterior density

Let us suppose that the sample size is large. Lindley12 stated that the posterior density f(β|data) may then be well approximated by its asymptotic normal form as

f(β|data)Np(ˆβ,)

where ˆβ is the maximum likelihood estimate of β, ∑ = (-H)-1 and H is the Hessian of log L(β) evaluated at .

For the logistic model (xi), the Hessian H=(hji( ˆβ )) evaluated at is given by

hjl(ˆβ)=ni=1xijxilexp(xiˆβ)(1+exp(xiˆβ))2,j,l=0,1,...,k,

Where xij is the jth component of xi with xi0 = 1. For given xf, z =xfβ will have approximately a posteriori a normal distribution with mean bxf = xfβ and variance d2xf=xfΣxf' , and with probability density function ϕ(z|bxf,   d2xf) . Using the transformation we can approximate f(β|xf,   data) by

Pr(yf=1|xf,   data)exp(z)1+exp(z)ϕ(z|bxf,d2xf)dz.

Analytical evaluation of (4.1) is very di cult. We can however evaluate then by numerical integration techniques viz Gauss-Hermite Quadrature Abramowitz and Stegun,13 Normal approximation Cox,14 Laplace's approximation de Bruijn.15

If the sample size is small, the posterior normality assumption may not be accurate. Therefore, we consider Flat prior approximation Tierney and Kadane16 as an alternative approach using the Laplace's method for integrals.

Effect of the variables xf

Here we assume that the future variables xf are dependent and the density of xf is p-dimensional multivariate normal i.e.

f(xf)Np(n,  ψ)

The conditional density of xf(r)  for given x*f is  

f(xf(r)|x*f)Nr(n*(r),  ψ*(r))

The probability of yf as a success when xf(r) is missing given by

Pr(yf=1|x*f,  β)=exp(xfβ)1+exp(xfβ)f(xf(r)|x*f)dxf(r)

ϕ((kri=0xfiβi+ki=kr+1n*iβi)/(k2+kij=kr+1βiβjΨ)1/2)

=g*(β) (Say)

Then the predictive probability of yf as a success when xf(r) is missing given by

pr(yf=1|x*f,  data)=g*(β)f(β|data)dβ. (xii)

The integral in (Xii) can be evaluated as the integral in (Xi) using Taylor's and Laplace's approximations.

If, instead, the future variables xf1 ,…, xfk are independently and normally distributed with mean ηi and variance (i = 1, 2, … , k), then the conditional density of xf(r) is

f(xf(r)|x*f)f(xf(r)) .

Consequently, we get

Pr(yf=1|x*f,  β)=exp(xfβ)1+exp(xfβ)f(xf(r))dxf(r)

ϕ((kri=0xfiβi+ki=kr+1niβi)/(k2+ki=kr+1β2iΨ2i)1/2)

=g(β) (Say)

See Aitchison and Begg17 in this context. Again,

Pr(yf=1|xf,  data)=g(β)f(β|data)dβ

Variables xf are dichotomous

Here we assume that the variables xf are independent and they can take only two values 0 or 1. We also assume that xfi  is distributed as

Pr(xfi=xfi)=θxfii(1θi)1xfi

If xf(r) is missing the probability of yf as a success is given by

Pr(yf=1|x*f,β)=1xfkr+1=0...1xfk=0exp(xfβ)1+exp(xfβ)ki=kr+1θxfii(1θi)1xfi=h(β) (Say).

The predictive probability of yf as a success when xf(r) is missing is given by

Pr(yf=1|x*f,  data)=h(β)f(β|data)dβ. (xiii)

If the sample size is large, assuming the normality assumption for the posterior density we can approximate (xiii) using Taylor's theorem, Laplace's method and normal approximation.

Example: one variable case

Here we consider two different logistic models based on any single variable either x1  or x2 . We want to measure the discrepancies between the predictive probability ˆpi , based on a single variable xi when xfi is known, and the predictive probability ˆp0 , based on xi alone when xf2 is missing, to assess the influence of the missing variable xfi  , i = 1, 2. The predictive probability ˆpi is determined using quadrature approximation and the predictive probability ˆp0 is determined using second order Taylor's approximation.

We assume that the marginal densities of the future variables xf1 and xf2  are normal with means 33.35, 78.24 and variances 65.39, 1827.0 respectively, where means and variances are the estimated sample means and sample variances from the observed data. We employ the absolute difference of probabilities and Kullback-Leibler divergence measure to assess the influence of the missing variable. The discrepancies are drawn in Figure 7. Here we see that the discrepancies due to missing xf1  in the predictive probability based on x1 are very large compared to the discrepancies due to missing xf2  in the predictive probability based on x2 . The discrepancies are less around the mean of the missing variable.

x1 fis missing x2f is missing

Kullback-Leibler directed divergence D_{KL}

x1 fis missing x2f is missing

Figure 7 Absolute difference |PiP0|,  i=1,2

Example: two-variable case

Now we consider that the predictive probability based on two variables xf1  and xf2  when both xf1  and xf2  are known is denoted by ˆp12 and the predictive probability ˆpij , i=0,1 , j=0,2 and (i,j)(1,2)  based on x1  and x2  when any future variable is missing. "0" indicates missing variable. Here also the predictive probability ˆp12 is determined using quadrature approximation and predictive probabilities ˆp10 , ˆp02 and ˆp00 are determined using second order Taylor's approximation. Here we assume that the joint density of xf1  and xf2 is bivariate normal with correlation coefficient 0.33 which is the estimated sample correlation coefficient from the observed data. The absolute differences of the two predictive probabilities ˆp12 and ˆp02 when xf1 is missing and the absolute differences of the two predictive probabilities ˆp12 and ˆp10 when xf2  is missing are drawn in Figure 8. Kullback-Leibler directed divergence DKL are drawn in Figure 9. The discrepancies when xf1  is missing and for different given values of the other variable for both the cases are close together since the correlation between xf1 and xf2  are very small. The discrepancies due to missing xf1  are very large compared to missing xf2  except near the mean of the missing variable. If both xf1  and xf2  are missing the discrepancies are drawn in Figure 10. These discrepancies are very similar to the discrepancies due to missing xf1  alone in the predictive probability based on x1  and x2  since the contribution of x2 is negligible.

x1f is missing

xf2 is missing

Figure 8 Absolute difference |ˆp12ˆp10|

xf1 is missing

Kullback-Leibler directed divergence DKL

xf2 is missing

Figure 9 Kullback-Leibler directed divergence DKL

Absolute difference |P12P00|,  i=1,2 .

Kullback-Leibler directed divergence DKL

Figure 10 X1f and X2ffare both missing.

Concluding remarks

In our present study we have observed that the discrepancies are minimum around the mean of the deleted variables as well as the mean of the missing future variables in both the logistic model and the log-odds ratio; the discrepancies are larger if the deleted or missing variables are more influential; the discrepancies in the deleted case are higher than the missing case.

In this present paper we studied the important problem of predictive influence of variables on the log odds ratio under a Bayesian set up. The treatment difference

Pr(Yi=1|Zi=1,xi)Pr(Yi=1|Zi=0,xi)

Or the risk of ratio

Pr(Yi=1|Zi=1,xi)/Pr(Yi=1|Zi=1,xi)

can also be studied along the same lines.

We have also considered the influence of missing future explanatory variables in a logistic model. Influence of missing future explanatory variables in a Probit and complementary log-log models can also be studied in similar fashion.

Acknowledgments

None.

Conflicts of interest

None.

References

  1. Breslow N. Odds ratio estimators when the data are sparse. Biometrika. 1981;68:73–84.
  2. Bohning D, Kuhnert, Rattanasiri S, et al.  Meta–analysis of binary data using pro le likelihood. 1st ed. A Chapman and Hall CRC Interdisciplinary Statistic. 2008.
  3. Pregibon D. Logistic regression diagnostics. Annals of Statistics. 1981;9:705–724.
  4. Cook, R Dennis, Weisberg, et al. Residuals and Influence in Regression. USA: New York: Chapman and Hall. 1982.
  5. Johnson W. Influence measures for logistic regression: Another point of view. Biometrika. 1985;72(1):59–65.
  6. Bhattacharjee SK, Dunsmore IR. The influence of variables in a logistic model. Biometrika. 1991;78(4):851–856.
  7. Mercier C, Shelley MC, Rimkus J, et al.  Age and Gender as Predictors of Injury Severity in Head–on Highway Vehicular Collisions. The 76th Annual Meeting , Transportation Research Board, Washington, USA. 1997.
  8. Zellner D, Keller F, Zellner GE. Variable selection in logistic regression models. Communications in Statistics. 2004;33(3):787–805.
  9. Kullback S, Leibler R A. On information and sufficiency. Ann Math Statist. 1951;22:79–86.
  10. S Rao Jammalamadaka, Tiwari RC, Chib Siddhartha. Bayes prediction in the linear model with spherically symmetric errors. Economics Letters. 1987;24:39–44.
  11. Bhattacharjee SK, Shamiri A, Sabiruzzaman Md, et al. Predictive Influence of Unavailable Values of Future Explanatory Variables in a Linear Model. Communications in Statistics – Theory and Methods. 2011;40:4458–4466.
  12. Lindley D V. The use of prior probability distributions in statistical inference and decisions. Proc. 4th Berkeley Symp. 1961;1:453–468.
  13. Abramowitz M, Stegun I A . Handbook of Mathematical Functions. USA: National Bureau of Standards. 1966.
  14. Cox DR. Binary regression. UK: London: Chapman and Hall. 1970.
  15. De Bruijn N C. Asymptotic Methods in Analysis. Amsterdam, North–Holland. 1961.
  16. Tierney L, Kadane, Joseph B, et al. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association. 1986;81(393):82–86.
  17. Aitchison J, Bagg CB (1976) Statistical diagnosis when basic cases are not classified with certainty. Biometrika 63: 1-12.
  18. Logistic Regression Example with Grouped Data. Regression FluShots, University of North Florida.
  19. Bhattacharjee SK, Dunsmore IR. The predictive influence of variables in a normal regression model. J Inform Optimiz Sci. 1995;16(2):327–334.
Creative Commons Attribution License

©2017 Bhattacharjee, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.