Review Article Volume 5 Issue 1
1Indian Statistical Institute, North-East Centre, Tezpur, Assam-0, India
2Indian Statistical Institute, India
3Basanti Devi College, India
4Department of Statistics and Applied Probability, University of California, USA
5Department of Mathematical Sciences, Ball State University, USA
Correspondence: S Rao Jammalamadaka, Department of Statistics and Applied Probability, University of California, USA
Received: October 01, 2016 | Published: February 1, 2017
Citation: Bhattacharjee SK, Biswas A, Dutta G, et al. Predictive influence of variables on the odds ratio and in the logistic model. Biom Biostat Int J. 2017;5(1):25-37. DOI: 10.15406/bbij.2017.05.00125
We study the influence of explanatory variables in prediction by looking at the distribution of the log-odds ratio. We also consider the predictive influence of a subset of unobserved future variables on the distribution of log-odds ratio as well as in a logistic model, via the Bayesian predictive density of a future observation. This problem is considered for dichotomous, as well as continuous explanatory variables.
AMS subject classification: Primary 62J12, Secondary 62B10, 62F15
Keywords: predictive density/probability, log-odds ratio, logistic model, predictive influence, missing/unobserved variable, kullback-leibler divergence
Odds ratio (OR) is perhaps the most popular measure of treatment difference for binary outcomes and is extensively used in dealing with 2×2 tables in biomedical studies and clinical trials. The distribution of the log of sample OR is often approximated by a normal distribution with true log OR as the mean and with variance estimated by the sum of the reciprocal of the four cell frequencies in the 2×2 table Breslow.1 Böhning et al.2 provide detailed book-length discussion on the OR. For logistic regression, ORs enable one to examine the effect of explanatory variables in that relationship.
Logistic link is perhaps the most popular way to model the success probabilities of a binary variable. Pregibon,3 Cook and Weisberg4 and Johnson5 have considered the problem of the influence of observations for logistic regression models. Several measures have been suggested to identify observations in the data set which are influential relative to the estimation of the vector of regression coefficients, the deviance, the determination of predictive probabilities and the classification of future observations.
Bhattacharjee & Dunsmore6 considered the effect on the predictive probability of a future observation of the omission of subsets of the explanatory variables. Mercier et al.7 used logistic regression to determine whether age and/or gender were a factor influencing severity of injuries suffered in head-on automobile collisions on rural highways. Zellner et al.8 considered the problem of variable selection in logistic regression to compare the performance of stepwise selection procedures with a bagging method.
In the present paper, our aim is to measure the predictive influence of a subset of explanatory variables in log-odds ratio of a logistic model using a Bayesian approach. We are also interested in studying the effect of missing future explanatory variables on Bayes prediction, on a logistic model as well as on the log-odds ratio.
In Section 2, we derive the predictive densities of a future log-odds ratio for both the full model and a subset deleted model. We derive the predictive density of log-odds ratio in Section 3, when a subset of future explanatory variables is missing. To derive the predictive densities we assume that the future explanatory variables xf are distributed as multivariate normal, both when these xf's are independent or dependent. In Section 4, we discuss the influence of future missing explanatory variables by considering the predictive probability of a future response in a logistic model. This is done by assuming that the future explanatory variables xf are multivariate normal for the continuous case. Also considered is the dichotomous case. Since the predictive probabilities are not mathematically tractable for the logistic model, we use several approximations.
In Section 2 and 3 we employ Kullback-Leibler9 directed measure of divergence DKL to assess the influence of variables and also the influence of future missing variables on the log-odds ratio. The form of the Kullback-Leibler9 measure used here is given by
DKL=∫f(a'Wf|.)log(f(a'Wf|.)f(r+s)(a'Wf|.))d(a'Wf).
To assess the influence of missing future variables or to measure the predictive probability in a logistic model we use the absolute difference of the two predictive probabilities.
Consider a phase III clinical trial with two competing treatments, say A and B, having binary responses. Suppose n patients are randomly allocated with nA and nB patients to treatments A and B respectively. The patient responses are influenced by a covariate vector xp×1 where one component of x may be 1 (which covers the constant term). Let ( Yi ; Zi ; xi ) be the data corresponding to its patient, where Yi is the indicator of response ( Yi =1 or 0 for a success or failure), zi is the indicator of the treatment assignment ( zi=1 )
or 0 according as treatment A or B is applied to the its patient), and x is the covariate vector. We assume a logit model for the responses:
Pr(Yi=1|Zi,xi)=exp(ΔZi+xiβ)1+exp(ΔZi+xiβ) i=1,2,....,n. (i)
Then the odds for treatments A and B with covariate vector xi are respectively
OA=Pr(Yi=1|Zi=1,xi)Pr(Yi=0|Zi=1,xi)=exp(Δ+xiβ) , OB=Pr(Yi=1|Zi=0,xi)Pr(Yi=0|Zi=0,xi)=exp(xiβ)
and hence the log-odds ratio is
logOR=logOAlogOB=Δ
Let us partition
xβ=xAβA+xBβB+xABβAB
Where xA indicates the variables used in treatment A only, xB is for treatment B only, and xAB is for both treatments A and B. Then the model can be partitioned for treatments A and B as:
logOA=u=Δ+xAβA+xABβAB=x(A)β(A) (ii)
logOB=v=xAβB+xABβAB=x(B)β(B) (iii)
The predictive density of future log-odds for A, uf , for non-informative prior (vague prior) with normal or any spherical symmetric errors is of Student form Jammalamadaka et al.10 and is given by
f(uf|xf(A),data)≡St(n−k,xf(A)ˆβ(A),s2(A)(1+xf'(A)(x'(A)xA)−1xf(A)))
where ˆβ(A) is the MLE of β(A) , s2(A) is the MLE of σ2A and k is the number of parameters in the model (ii). See Bhattacharjee et al.11 in this context. If the sample size is large then this predictive density can be well approximated by its asymptotic normal form
N(xf(A)ˆβ(A), s2(A)(1+xf'(A)(x'(A)x(A))−1xf(A))(n−k)/(n−k−2)).
Similarly one can find the same for treatment B, vf .
Let us define wf=(uf,vf)' and a=(1,−1)' . Then the predictive density of future log odds ratio a'wf is given by
f(a'wf|xf(A),xf(B),data)≈N(θ,δ2) (iv)
Where
θ=xf(A)ˆβ(A)−xf(B)ˆβ(B)
and
δ2=s2(A)(1+xf'(A)(x'(A)x(A))−1xf(A))(n−k)/(n−k−2)+s2(B)((1+xf'(B)(x'(B)x(B))−1xf(B))(n−q)/(n−q−2))
Our interest is to measure the influence of explanatory variables in the predictive density (iv) for the following cases:
Case 1: Influence of r explanatory variables xrA of xA in treatment A.
Case 2: Influence of r explanatory variables xrB of xB in treatment B.
Case 3: Influence of s explanatory variables xsAB of xAB in treatment A.
Case 4: Influence of S explanatory variables xsAB of xAB in treatment B.
Case 5: Joint influence of r explanatory variables xrA of xA and s explanatory variables xsAB of xAB in treatment A.
Case 6: Joint influence of r explanatory variables xrB of xB and s explanatory variables xsAB of xAB in treatment B.
To see the influence of explanatory variables in log-odds ratio, we construct a reduced log-odds model deleting a subset of explanatory variables. Then we derive the predictive density of future log-odds ratio for reduced model and compare it with the predictive density (iv) for full model. It is enough to consider Case 5 for illustration. We construct the reduced model by deleting variables xrA of xA and xsAB of xAB in (ii) as
u=Δ+x*Aβ*A+x*ABβ*AB=x*(A)β*(A)
Then the predictive density of uf is given by
f(uf|x*f(A),data)=St(n−k+r+s,x*f(A)ˆβ*(A), S*2(A)(1+x*f'(A)(x*'(A)x*(A))−1 x*f(A)))
The normal approximation of the predictive density is
N(x*f(A)ˆβ*(A),s*2(A)(1+x*f'(A)(x*'(A)x*(A))−1x*f(A))(n−k+r+s)/(n−k+r+s−2))
Since no variable is missing in υ=logOB , the predictive density of υf is unaltered along with its normal approximation. Hence the predictive density of log-odds ratio a'wf under Case 5 is given by
f(r+s)(a'wf|x*f(A),xf(B),data)≈N(θ*,δ*2) (v)
Where
θ*=x*f(A)ˆβ*(A)−xf(B)ˆβ(B)
and
δ*2=s*2(A)(1+x*f'(A)(x*'(A)x*(A))−1x*f(A))(n−k+r+s)/(n−k+r+s−2)+s2(B)(1+xf'(B)(x'(B)x(B))−1xf(B))(n−q)/(n−q−2)
To access the influence of the deleted variables we employ the Kullback-Leibler9 directed measure of divergence DKL between the predictive densities of a'wf for full model (iv) and reduced model (v). The form of K-L measure used here is given by
DKL=∫f(r+s)(a'ωf|.)log(f(r+s)(a'wf|.)f(a'wf|.))da'ωf
The discrepancy measure DKL between the predictive densities (iv) and (v) reduces to
DKL=(θ−θ*)22δ2+12(δ*2δ2−log(δ*2δ2)−1)
Here L=(θ−θ*)22δ2 is due to difference of location parameters and S=12(δ*2δ2−log(δ*2δ2)−1) due to difference of scale parameters of the two predictive densities (iv) and (v).
Example 1: Here we have considered a flu shot Data Pregibon.3 A local health clinic sent fliers to its clients to encourage everyone, but especially older persons at high risk of complications, to get a flu shot for protection against an expected flu epidemic. In a pilot follow-up study, 159 clients were randomly selected and asked whether they actually received a flu shot. A client who received a flu shot was coded Y=1; and a client who did not receive a flu shot was coded Y=0. In addition, data were collected on their age (x1) and their health awareness (x2) . Also included in the data were client gender (x3) , with males coded x3=1 and females coded x3=0 . Here we have divided whole data set into two groups A and B on the basis of gender that is group A corresponds to the male and group B corresponds to the female. We have computed DKL to measure the influence of the deleted variable x1 in group A and B separately and the discrepancies are drawn in Figure 1.
Example 2: This is a simulation exercise. Here we have drawn sample of size 159 from bivariate normal distribution and we have used means, variances and correlation coefficient of x1 and x2 of the above flu shot data of size 159 for generating the sample. Now using these x1 and x2 , we got response that is Y values and thereafter using this whole generated data set we have computed DKL . Now we have repeated whole process 1000 times and computed means of DKLs . The mean discrepancies are shown in Figure 2. Here we get the same conclusion as in the data example.
Here the aim is to detect the predictive influence of a set of missing future explanatory variables in log-odds ratio of logistic model (i). Our interest is to detect the influence of missing future explanatory variables in the six cases pointed out in Section 2. Let in treatment A, r future variables missing from xfA and s future variables missing from xfAB be denoted by x(r+s)f(A) . Similarly in treatment B, r future missing variables from xfB and s future variables missing from xfAB be denoted by x(r+s)f(B) . We assume that the errors of models (ii) and (iii) are normally distributed with zero means and variances τ−1(A) and τ−1(B) , respectively. We also assume that the conditional density of x*f(r) given x*f is independent of β(A) and τ(A) and x(r+|s)f(B) given x*f(B) is independent of β(B) and τ(B) , i.e.,
f(x(r+s)f(.)|x*f(.),β(.),τ(.))=f(x(r+s)f(.)|x*f(.))
where x*f(.) denotes the future explanatory variables xf(.) without x(r+s)f(.) .
Explanatory variables are continuous
We assume that xf,i s are dependent and the distribution of xf(A) is (k−1) -dimensional multivariate normal, i.e. f(xf(A))≡Nk−1(η, ψ) .
The conditional density of x(r+s)f(A) given x*f(A) is given by
f(x(r+s)f(A)|x*f(A))≡Nr+s(η*(r+s),ψ*(r+s)) ,
Where
η=(η*, ηr+s),xf(A)=(x*f(A), x(r+s)f(A)), ψ=(ψ11 ψ12ψ21 ψ22), η*r+s=ηr+s+ψ21ψ−111(x*f(A)−η*)
and ψ*(r+s)=ψ22−ψ21ψ−111ψ12 .
As earlier it is enough to consider Case 5 to see the joint influence of r missing future explanatory variables xrfA of xfA and s missing future explanatory variables xsfAB of xfAB in treatment A. The density of uf when x(r+s)f(A) is missing is given by
f(uf|x*f(A),β|(A),τ(A))=∫f(uf|xf(A),β(A),τ(A))f(x(r+s)f(A)|x*f(A))dx(r+s)f(A)≡N(k−r−s−1∑i=0xf(A)iˆβ(A)i+k−1∑i=k−r−sη*iˆβ(A)i,k−1∑i=k−r−sˆβ(A)iˆβ(A)jψ*ij+τ−1(A))
Where η*i is the i th component of η*(r+s) and ψ*ij is the (i.j)th component of ψ*(r+s) .
See Bhattacharjee et al11 in this context. Using Taylor's expansion and improper prior density for both β(A) and τ(A) , the approximate predictive density of uf when x(r+s)f(A) is missing is given by
f(r+s)(uf|x*f(A), data)≡N(k−r−s−1∑i=0xf(A)iˆβ(A)i+k−1∑i=k−r−sη*iˆβ(A)i,k−1∑i,j=k−r−sˆβ(A)iˆβ(A)jψ*ij+s2(A)γ*),
evaluated at ˆβ(A) and s2(A) where
γ*=(1+12k−1∑0Q*ij(β(A), τ(A))Cov(β(A)i,β(A)j)+12Q2τ(A)(β(A),τ(A))Var(τ(A)))
is the multiplicative factor for the second order Taylor's approximation. If xf(A)'s ’s are independent the corresponding approximate predictive density of uf is
f(r+s)(uf|x*f(A), data)≡N(k−r−s−1∑i=0xf(A)iˆβ(A)i+k−1∑i=k−r−sηiˆβ(A)i,k−1∑i,j=k−r−sˆβ2(A)iψ2i+s2(A)γ)
evaluated at ˆβ(A) and s2(A) , where ηi and ψ2i are mean and variance of the ith missing variable and γ=(1+12k−1∑0Qij(β(A),τ(A))Cov(β(A)i,β(A)j)+12Q2τ(A)(β(A), τ(A))Var(τ(A))) . Since no future variable is missing in υ , the approximate predictive density of υf is same as obtained in Section 2. Thus when xf(A) ’s are dependent the approximate predictive density of log-odds ratio a'wf for x(r+s)f(A) missing is given by
f(r+s)(a'wf|x*f(A),xf(B); data)≡γ*N(ξ,ω2) (vi)
Where
ξ=k−r−s−1∑i=0xf(A)iˆβ(A)i+k−1∑i=k−r−sη*iˆβ(A)i−xf(B)ˆβ(B)
and
ω2=(k−1∑i,j=k−r−sˆβ(A)iˆβ(A)jψ*ij+s2(A))+s2(B)(1+xf(B)(X'(B)X(B))−1xf'(B))n−qn−q+2
The Kullback-Leibler9 directed measure of divergence between the predictive densities (iv) when no variable is missing and the predictive density (vi) when
DKL=∫f(a'wf|xf(A),xf(B), data)log(f(a'wf|xf(A),xf(B), data)f(r+s)(a'wf|x*f(A),xf(B), data))da'wf=12ω2(θ−ξ)2+12(δ2ω2−log(δ2ω2)−1)
−12k−1∑i,j=0E(Q*ij(β(A), τ(A))Cov(τ(A)i, τ(A)j))−12E(Q2τ(A)(β(A),τ(A))var(τ(A)))
(vii)
If xf(A) ’s are independent the predictive density of a'wf when (r+s) future variables are missing is same as (vi) and the corresponding Kullback-Leibler9 measure DKL is same as (vii) but replacing η*i by ηi in ξ , ˆβ(A)iˆβ(A)jψ*ij by ˆβ2(A)iψ2i in ω2 and Q*ij(β(A), τ(A)) by Qij(β(A), τ(A)) in γ* , where ηi and ψ2i are mean and variance of the ith missing variable.
Explanatory variables are dichotomous
Here we assume that all the explanatory variables are dichotomous and independent. We assume that the errors of models (ii) and (iii) are normally distributed with means zero and variances τ−1(A) and τ−1(B) respectively. To assess the influence of the missing variables in treatment A, we consider that xf(A)i is distributed as
Pr(Xf(A)i=xf(A)i)=θxf(A)i(A)i(1−θ(A)i)1−xf(A)i,xf(A)i=0,1, i=1,2,...,k−1
The density of a future uf is
f(uf|xf(A),β(A), τ(A))≡N(k−1∑i=0xf(A)iβ(A)i, τ−1(A)).
If x(r)f(A) future variables are missing in treatment A, then the density of a future uf is given by
f(uf|x*f(A),β(A), τ−1(A))=1∑xf(A)k−r=0.....1Σxf(A)k−1=0N(k−1∑i=0xf(A)iβ(A)i, τ−1(A))k−1∏i=k−rθxf(A)i(A)i(1−θ(A)i)1−xf(A)i.
The predictive density of uf when x(r)f(A) is missing is given by
f(uf|x*f(A), data=∫f(uf|x*f(A))β(A), τ−1(A))f(β(A)|data)dβ(A) (viii)
which is not mathematically tractable. For vague prior densities for β(A) and τ(A) and using Taylor's expansion, the approximate predictive density of (viii) is
f(uf|x*f(A), data)=1∑xf(A)k−r=0...1∑xf(A)k−1=0N(k−1∑i=0xf(A)iˆβ(A)i,s2(A))k−1∏i=k−rθxf(A)i(A)i(1−θ(A)i)1−xf(A)i(1+k−1∑i,j=0Qij(ˆβ,s−2(A))cov(β(A)i,β(A)j)2+Qτ2(A)(ˆβ(A),s−2(A))var(τ(A))2)
Since there are no missing variables in νf , the density of νf is same as that can be obtained in Section 2. Then the predictive density of a'wf is given by
f(a'wf|x*f(A),xf(B), data)=1∑xf(A)k−r=0...1∑xf(A)k−1=0N(k−1∑i=0(xf(A)iˆβ(A)i−xf(B)iˆβ(B)i),S2(A)+s2(B)(1+xf(B)(X'(B)X(B))−1x'(B)))k−1∏i=k−rθxf(A)i(A)i(1−θ(A)i)1−xf(A)i(1+k−1∑i,j=0Qij(ˆβ(A),s−2(A))cov(β(A)i,β(A)j)2+QT2(A)(ˆβ(A),s−2(A))var(τ(A))2) (ix)
Analytical solution of DKL between the predictive densities (iv) and (ix) is very difficult to obtain but numerical solution can be obtained. In Some situations it is seen that among the explanatory variables, some of the variables are dichotomous and some of the variables are continuous. Among the k−1 -explanatory variables, without loss of generality we assume that the first l are dichotomous and the remaining last k−l−1 are continuous variables. We also assume that out of l dichotomous future variables last d variables are missing and out of (k−l−1) continuous future variables last g variables are missing. Then the predictive density of future log-odds ratio a'wf when d dichotomous and g continuous variables are missing is given by
f(a'wf|x*f(A),x*f(B), data)=(1∑xf(A)l−d+1=0...1∑xf(A)l=0N(k−g−1∑i=0xf(A)iˆβ(A)i+k−1∏i=k−gηiˆβ(A)i−k−1Σi=0xf(B)iˆβ(B)i, k−1Σi=k−gˆβ2(A)iΨ2i+S2(A)+S2(B)(1+xf(B)(X'(B)X(B))−1x'(B))). (x)lΠi=l−d+1θxf(A)ii(1−θi)1−xf(A)i)(1+k−1∑i,j=0Qij(ˆβ(A),s−2(A))cov(β(A)i,β(A)j)2+QT2(A)(ˆβ(A),s−2(A))var(T(A))2) (x)
Again, analytical solution of DKL between the predictive densities (iv) and (x) is very difficult but we can obtain its numerical solution. In similar way we can derive the predictive density of future log-odds ratio when some future variables are missing in treatment B.
Example 1 revisited: This example is based on the flu shot data of Example 1. From Figure 3 we have observed same as Examples 1 and 2 that the discrepancies are less around the mean of the missing variables. Moreover we have observed from Figures 1 and 3 that the discrepancies of the missing variables are less as compared to the discrepancies of the deleted variables.
Example 2 revisited: This example is based on the simulation data of Example 2 and here we have also got same conclusion as Example 1 revisited (Figures 2 & 4).
Group A Group B
Figure 1 Three dimensional scatter plots based on real data for DKL
when x1 is deleted.Group A Group B
Figure 2 Three dimensional scatter plots based on simulated data for DKL
when x1 is deleted.Group A Group B
Figure 3 Three dimensional scatter plots based on real data for DKL
when xf1 is missing.Group A Group B
Figure 4 Three dimensional scatter plots based on simulated data for DKL
when xf1 is missing.Examples 1 and 2 revisited: In this example, we have used DKL values for real data for drawing box plots for each cases (deleted and missing). From Figure 5, we have observed that x2 is more in uential than x1. Moreover the discrepancies are much less in missing case than deleted case. We have got same result in simulation study and are illustrated in Figure 6.
We consider the logistic model as
Pr(y=1|x,β)=exp(xβ)/(1+exp(xβ))
The probability that a future response yf will be a success is given by
Pr(yf=1|xf,β)=exp(xfβ)/(1+exp(xfβ))
We assume that the conditional density of xf(r) given xf is independent of, β where xf denotes the future explanatory variables without variables xf(r). Then predictive probabilities of yf will be a success for models are given by
Pr(yf=1|xf, data)=∫Pr(yf=1|xf,β)f(β| data)dβ
and
Pr(yf=1|x*f, data)=∫Pr(yf=1|x*f,β)f(β| data)dβ respectively. Simple analytically tractable priors are not available here. Numerical integration techniques might be used for some specified priors to approximate Pr(yf=1|xf,data) and Pr(yf=1|x*f,data) , respectively.
Normal approximation for the posterior density
Let us suppose that the sample size is large. Lindley12 stated that the posterior density f(β|data) may then be well approximated by its asymptotic normal form as
f(β|data)≈Np(ˆβ,∑)
where ˆβ is the maximum likelihood estimate of β, ∑ = (-H)-1 and H is the Hessian of log L(β) evaluated at .
For the logistic model (xi), the Hessian H=(hji( ˆβ )) evaluated at is given by
hjl(ˆβ)=−n∑i=1xijxilexp(xiˆβ)(1+exp(xiˆβ))2,j,l=0,1,...,k,
Where xij is the jth component of xi with xi0 = 1. For given xf, z =xfβ will have approximately a posteriori a normal distribution with mean bxf = xf∧β and variance d2xf=xfΣxf' , and with probability density function ϕ(z|bxf, d2xf) . Using the transformation we can approximate f(β|xf, data) by
Pr(yf=1|xf, data)≈∫exp(z)1+exp(z)ϕ(z|bxf,d2xf)dz.
Analytical evaluation of (4.1) is very di cult. We can however evaluate then by numerical integration techniques viz Gauss-Hermite Quadrature Abramowitz and Stegun,13 Normal approximation Cox,14 Laplace's approximation de Bruijn.15
If the sample size is small, the posterior normality assumption may not be accurate. Therefore, we consider Flat prior approximation Tierney and Kadane16 as an alternative approach using the Laplace's method for integrals.
Here we assume that the future variables xf are dependent and the density of xf is p-dimensional multivariate normal i.e.
f(xf)≡Np(n, ψ)
The conditional density of xf(r) for given x*f is
f(xf(r)|x*f)≡Nr(n*(r), ψ*(r))
The probability of yf as a success when xf(r) is missing given by
Pr(yf=1|x*f, β)=∫exp(xfβ)1+exp(xfβ)f(xf(r)|x*f)dxf(r)
≈ϕ((k−r∑i=0xfiβi+k∑i=k−r+1n*iβi)/(k2+k∑ij=k−r+1βiβjΨ)1/2)
=g*(β) (Say)
Then the predictive probability of yf as a success when xf(r) is missing given by
pr(yf=1|x*f, data)=∫g*(β)f(β|data)dβ. (xii)
The integral in (Xii) can be evaluated as the integral in (Xi) using Taylor's and Laplace's approximations.
If, instead, the future variables xf1 ,…, xfk are independently and normally distributed with mean ηi and variance (i = 1, 2, … , k), then the conditional density of xf(r) is
f(xf(r)|x*f)≡f(xf(r)) .
Consequently, we get
Pr(yf=1|x*f, β)=∫exp(xfβ)1+exp(xfβ)f(xf(r))dxf(r)
≈ϕ((k−r∑i=0xfiβi+k∑i=k−r+1niβi)/(k2+k∑i=k−r+1β2iΨ2i)1/2)
=g(β) (Say)
See Aitchison and Begg17 in this context. Again,
Pr(yf=1|xf, data)=∫g(β)f(β|data)dβ
Variables xf are dichotomous
Here we assume that the variables xf are independent and they can take only two values 0 or 1. We also assume that xfi is distributed as
Pr(xfi=xfi)=θxfii(1−θi)1−xfi
If xf(r) is missing the probability of yf as a success is given by
Pr(yf=1|x*f,β)=1∑xfk−r+1=0...1∑xfk=0exp(xfβ)1+exp(xfβ)k∏i=k−r+1θxfii(1−θi)1−xfi=h(β) (Say).
The predictive probability of yf as a success when xf(r) is missing is given by
Pr(yf=1|x*f, data)=∫h(β)f(β|data)dβ. (xiii)
If the sample size is large, assuming the normality assumption for the posterior density we can approximate (xiii) using Taylor's theorem, Laplace's method and normal approximation.
Example: one variable case
Here we consider two different logistic models based on any single variable either x1 or x2 . We want to measure the discrepancies between the predictive probability ˆpi , based on a single variable xi when xfi is known, and the predictive probability ˆp0 , based on xi alone when xf2 is missing, to assess the influence of the missing variable xfi , i = 1, 2. The predictive probability ˆpi is determined using quadrature approximation and the predictive probability ˆp0 is determined using second order Taylor's approximation.
We assume that the marginal densities of the future variables xf1 and xf2 are normal with means 33.35, 78.24 and variances 65.39, 1827.0 respectively, where means and variances are the estimated sample means and sample variances from the observed data. We employ the absolute difference of probabilities and Kullback-Leibler divergence measure to assess the influence of the missing variable. The discrepancies are drawn in Figure 7. Here we see that the discrepancies due to missing xf1 in the predictive probability based on x1 are very large compared to the discrepancies due to missing xf2 in the predictive probability based on x2 . The discrepancies are less around the mean of the missing variable.
x1 fis missing x2f is missing
Kullback-Leibler directed divergence D_{KL}
x1 fis missing x2f is missing
Example: two-variable case
Now we consider that the predictive probability based on two variables xf1 and xf2 when both xf1 and xf2 are known is denoted by ˆp12 and the predictive probability ˆpij , i=0,1 , j=0,2 and (i,j)≠(1,2) based on x1 and x2 when any future variable is missing. "0" indicates missing variable. Here also the predictive probability ˆp12 is determined using quadrature approximation and predictive probabilities ˆp10 , ˆp02 and ˆp00 are determined using second order Taylor's approximation. Here we assume that the joint density of xf1 and xf2 is bivariate normal with correlation coefficient 0.33 which is the estimated sample correlation coefficient from the observed data. The absolute differences of the two predictive probabilities ˆp12 and ˆp02 when xf1 is missing and the absolute differences of the two predictive probabilities ˆp12 and ˆp10 when xf2 is missing are drawn in Figure 8. Kullback-Leibler directed divergence DKL are drawn in Figure 9. The discrepancies when xf1 is missing and for different given values of the other variable for both the cases are close together since the correlation between xf1 and xf2 are very small. The discrepancies due to missing xf1 are very large compared to missing xf2 except near the mean of the missing variable. If both xf1 and xf2 are missing the discrepancies are drawn in Figure 10. These discrepancies are very similar to the discrepancies due to missing xf1 alone in the predictive probability based on x1 and x2 since the contribution of x2 is negligible.
In our present study we have observed that the discrepancies are minimum around the mean of the deleted variables as well as the mean of the missing future variables in both the logistic model and the log-odds ratio; the discrepancies are larger if the deleted or missing variables are more influential; the discrepancies in the deleted case are higher than the missing case.
In this present paper we studied the important problem of predictive influence of variables on the log odds ratio under a Bayesian set up. The treatment difference
Pr(Yi=1|Zi=1,xi)−Pr(Yi=1|Zi=0,xi)
Or the risk of ratio
Pr(Yi=1|Zi=1,xi)/Pr(Yi=1|Zi=1,xi)
can also be studied along the same lines.
We have also considered the influence of missing future explanatory variables in a logistic model. Influence of missing future explanatory variables in a Probit and complementary log-log models can also be studied in similar fashion.
None.
None.
©2017 Bhattacharjee, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7