Predictive influence of variables on the odds ratio and in the logistic model

doi:10.15406/bbij.2017.05.00125

We study the influence of explanatory variables in prediction by looking at the distribution of the log-odds ratio. We also consider the predictive influence of a subset of unobserved future variables on the distribution of log-odds ratio as well as in a logistic model, via the Bayesian predictive density of a future observation. This problem is considered for dichotomous, as well as continuous explanatory variables.

AMS subject classification: Primary 62J12, Secondary 62B10, 62F15

Keywords: predictive density/probability, log-odds ratio, logistic model, predictive influence, missing/unobserved variable, kullback-leibler divergence

Odds ratio (OR) is perhaps the most popular measure of treatment difference for binary outcomes and is extensively used in dealing with 2×2 tables in biomedical studies and clinical trials. The distribution of the log of sample OR is often approximated by a normal distribution with true log OR as the mean and with variance estimated by the sum of the reciprocal of the four cell frequencies in the 2×2 table Breslow.¹ Böhning et al.² provide detailed book-length discussion on the OR. For logistic regression, ORs enable one to examine the effect of explanatory variables in that relationship.

Logistic link is perhaps the most popular way to model the success probabilities of a binary variable. Pregibon,³ Cook and Weisberg⁴ and Johnson⁵ have considered the problem of the influence of observations for logistic regression models. Several measures have been suggested to identify observations in the data set which are influential relative to the estimation of the vector of regression coefficients, the deviance, the determination of predictive probabilities and the classification of future observations.

Bhattacharjee & Dunsmore⁶ considered the effect on the predictive probability of a future observation of the omission of subsets of the explanatory variables. Mercier et al.⁷ used logistic regression to determine whether age and/or gender were a factor influencing severity of injuries suffered in head-on automobile collisions on rural highways. Zellner et al.⁸ considered the problem of variable selection in logistic regression to compare the performance of stepwise selection procedures with a bagging method.

In the present paper, our aim is to measure the predictive influence of a subset of explanatory variables in log-odds ratio of a logistic model using a Bayesian approach. We are also interested in studying the effect of missing future explanatory variables on Bayes prediction, on a logistic model as well as on the log-odds ratio.

In Section 2, we derive the predictive densities of a future log-odds ratio for both the full model and a subset deleted model. We derive the predictive density of log-odds ratio in Section 3, when a subset of future explanatory variables is missing. To derive the predictive densities we assume that the future explanatory variables $x^{f}$ are distributed as multivariate normal, both when these x^f's are independent or dependent. In Section 4, we discuss the influence of future missing explanatory variables by considering the predictive probability of a future response in a logistic model. This is done by assuming that the future explanatory variables $x^{f}$ are multivariate normal for the continuous case. Also considered is the dichotomous case. Since the predictive probabilities are not mathematically tractable for the logistic model, we use several approximations.

In Section 2 and 3 we employ Kullback-Leibler⁹ directed measure of divergence D_KL to assess the influence of variables and also the influence of future missing variables on the log-odds ratio. The form of the Kullback-Leibler⁹ measure used here is given by

$D_{K L} = \int f (a' W^{f} | .) \log (\frac{f (a' W^{f} | .)}{f_{_{(r + s)} (a' W^{f} | .)}}) d (a' W^{f}) .$

To assess the influence of missing future variables or to measure the predictive probability in a logistic model we use the absolute difference of the two predictive probabilities.

Consider a phase III clinical trial with two competing treatments, say A and B, having binary responses. Suppose $n$ patients are randomly allocated with $n_{A}$ and $n_{B}$ patients to treatments A and B respectively. The patient responses are influenced by a covariate vector $x^{p \times 1}$ where one component of $x$ may be 1 (which covers the constant term). Let ( $Y_{i}$ ; $Z_{i}$ ; $x_{i}$ ) be the data corresponding to its patient, where Yi is the indicator of response ( $Y_{i}$ =1 or 0 for a success or failure), $z_{i}$ is the indicator of the treatment assignment ( $z_{i} = 1$ )

or 0 according as treatment A or B is applied to the its patient), and $x$ is the covariate vector. We assume a logit model for the responses:

$\Pr (Y_{i} = 1 | Z_{i}, x_{i}) = \frac{\exp (Δ Z_{i} + x_{i} β)}{1 + \exp (Δ Z_{i} + x_{i} β)}$ $i = 1, 2, ...., n .$ (i)

Then the odds for treatments A and B with covariate vector x_i are respectively

$O_{A} = \frac{\Pr (Y_{i} = 1 | Z_{i} = 1, x_{i})}{\Pr (Y_{i} = 0 | Z_{i} = 1, x_{i})} = \exp (Δ + x_{i} β)$ , $O_{B} = \frac{\Pr (Y_{i} = 1 | Z_{i} = 0, x_{i})}{\Pr (Y_{i} = 0 | Z_{i} = 0, x_{i})} = \exp (x_{i} β)$

and hence the log-odds ratio is

$\log O R = \frac{\log O_{A}}{\log O_{B}} = Δ$

Let us partition

$x β = x_{A} β_{A} + x_{B} β_{B} + x_{A B} β_{A B}$

Where $x_{A}$ indicates the variables used in treatment A only, $x_{B}$ is for treatment B only, and $x_{A B}$ is for both treatments A and B. Then the model can be partitioned for treatments A and B as:

$\log O_{A} = u = Δ + x_{A} β_{A} + x_{A B} β_{A B} = x_{(A)} β_{(A)}$ (ii)

$\log O_{B} = v = x_{A} β_{B} + x_{A B} β_{A B} = x_{(B)} β_{(B)}$ (iii)

The predictive density of future log-odds for A, $u^{f}$ , for non-informative prior (vague prior) with normal or any spherical symmetric errors is of Student form Jammalamadaka et al.¹⁰ and is given by

$f (u^{f} | x_{(A)}^{f}, d a t a) \equiv S t (n - k, x_{(A)}^{f} {\hat{β}}_{(A)}, s_{(A)}^{2} (1 + x_{(A)}^{f'} {(x'_{(A)} x_{A})}^{- 1} x_{(A)}^{f}))$

where ${\hat{β}}_{(A)}$ is the MLE of $β_{(A)}$ , $s_{(A)}^{2}$ is the MLE of $σ_{A}^{2}$ and k is the number of parameters in the model (ii). See Bhattacharjee et al.¹¹ in this context. If the sample size is large then this predictive density can be well approximated by its asymptotic normal form

$N (x_{(A)}^{f} \hat{β} (A), s_{(A)}^{2} (1 + x_{(A)}^{f'} {(x_{(A)}^{'} x_{(A)})}^{- 1} x_{(A)}^{f}) (n - k) / (n - k - 2)) .$

Similarly one can find the same for treatment B, $v^{f}$ .

Let us define $w^{f} = {(u^{f}, v^{f})}^{'}$ and $a = {(1, - 1)}^{'}$ . Then the predictive density of future log odds ratio $a^{'} w^{f}$ is given by

$f (a^{'} w^{f} | x_{(A)}^{f}, x_{(B)}^{f}, d a t a) \approx N (θ, δ^{2})$ (iv)

Where

$θ = x_{(A)}^{f} {\hat{β}}_{(A)} - x_{(B)}^{f} {\hat{β}}_{(B)}$

and

$δ^{2} = s_{(A)}^{2} (1 + x_{(A)}^{f'} {(x_{(A)}^{'} x_{(A)})}^{- 1} x_{(A)}^{f}) (n - k) / (n - k - 2) + s_{(B)}^{2} ((1 + x_{(B)}^{f'} {(x_{(B)}^{'} x_{(B)})}^{- 1} x_{(B)}^{f}) (n - q) / (n - q - 2))$

Our interest is to measure the influence of explanatory variables in the predictive density (iv) for the following cases:

Case 1: Influence of $r$ explanatory variables $x_{A}^{r}$ of $x_{A}$ in treatment A.

Case 2: Influence of $r$ explanatory variables $x_{B}^{r}$ of $x_{B}$ in treatment B.

Case 3: Influence of $s$ explanatory variables $x_{A B}^{s}$ of $x_{A B}$ in treatment A.

Case 4: Influence of S explanatory variables $x_{A B}^{s}$ of $x_{A B}$ in treatment B.

Case 5: Joint influence of $r$ explanatory variables $x_{A}^{r}$ of $x_{A}$ and s explanatory variables $x_{A B}^{s}$ of $x_{A B}$ in treatment A.

Case 6: Joint influence of r explanatory variables $x_{B}^{r}$ of $x_{B}$ and s explanatory variables $x_{A B}^{s}$ of $x_{A B}$ in treatment B.

To see the influence of explanatory variables in log-odds ratio, we construct a reduced log-odds model deleting a subset of explanatory variables. Then we derive the predictive density of future log-odds ratio for reduced model and compare it with the predictive density (iv) for full model. It is enough to consider Case 5 for illustration. We construct the reduced model by deleting variables $x_{A}^{r}$ of $x_{A}$ and $x_{A B}^{s}$ of $x_{A B}$ in (ii) as

$u = Δ + x_{A}^{*} β_{A}^{*} + x_{A}^{*}_{B} β_{A B}^{*} = x_{(A)}^{*} β_{(A)}^{*}$

Then the predictive density of $u^{f}$ is given by

$f (u^{f} | x_{(A)}^{* f}, d a t a) = S t (n - k + r + s, x_{(A)}^{* f} {\hat{β}}_{(A)}^{*}, S_{(A)}^{* 2} (1 + x_{(A)}^{* f'} {(x_{(A)}^{*'} x_{(A)}^{*})}^{- 1} x_{(A)}^{* f}))$

The normal approximation of the predictive density is

$N (x_{(A)}^{* f} {\hat{β}}_{(A)}^{*}, s_{(A)}^{* 2} (1 + x_{(A)}^{* f'} {(x_{(A)}^{*'} x_{(A)}^{*})}^{- 1} x_{(A)}^{* f}) (n - k + r + s) / (n - k + r + s - 2))$

Since no variable is missing in $υ = \log O_{B}$ , the predictive density of $υ^{f}$ is unaltered along with its normal approximation. Hence the predictive density of log-odds ratio $a^{'} w^{f}$ under Case 5 is given by

$f_{(r + s)} (a^{'} w^{f} | x_{(A)}^{* f}, x_{(B)}^{f}, d a t a) \approx N (θ^{*}, δ^{* 2})$ (v)

Where

$θ^{*} = x_{(A)}^{* f} \hat{β} *_{(A)} - x_{(B)}^{f} {\hat{β}}_{(B)}$

and

$\begin{array}{l} δ^{* 2} = s_{(A)}^{* 2} (1 + x_{(A)}^{* f'} {(x_{(A)}^{*'} x_{(A)}^{*})}^{- 1} x_{(A)}^{* f}) (n - k + r + s) / (n - k + r + s - 2) \\ + s_{(B)}^{2} (1 + x_{(B)}^{f'} {(x_{(B)}^{'} x_{(B)})}^{- 1} x_{(B)}^{f}) (n - q) / (n - q - 2) \end{array}$

To access the influence of the deleted variables we employ the Kullback-Leibler⁹ directed measure of divergence $D_{K L}$ between the predictive densities of $a^{'} w^{f}$ for full model (iv) and reduced model (v). The form of K-L measure used here is given by

$D_{KL} = \int f_{(r + s)} (a' ω^{f} | .) \log (\frac{f_{(r + s)} (a' w^{f} | .)}{f (a' w^{f} | .)}) d a^{'} ω^{f}$

The discrepancy measure $D_{K L}$ between the predictive densities (iv) and (v) reduces to

$D_{K L} = \frac{{(θ - θ *)}^{2}}{2 δ^{2}} + \frac{1}{2} (\frac{δ^{* 2}}{δ^{2}} - \log (\frac{δ^{* 2}}{δ^{2}}) - 1)$

Here $L = \frac{{(θ - θ^{*})}^{2}}{2 δ^{2}}$ is due to difference of location parameters and $S = \frac{1}{2} (\frac{δ^{* 2}}{δ^{2}} - \log (\frac{δ^{* 2}}{δ^{2}}) - 1)$ due to difference of scale parameters of the two predictive densities (iv) and (v).

Example 1: Here we have considered a flu shot Data Pregibon.³ A local health clinic sent fliers to its clients to encourage everyone, but especially older persons at high risk of complications, to get a flu shot for protection against an expected flu epidemic. In a pilot follow-up study, 159 clients were randomly selected and asked whether they actually received a flu shot. A client who received a flu shot was coded Y=1; and a client who did not receive a flu shot was coded Y=0. In addition, data were collected on their age $(x_{1})$ and their health awareness $(x_{2})$ . Also included in the data were client gender $(x_{3})$ , with males coded $x_{3} = 1$ and females coded $x_{3} = 0$ . Here we have divided whole data set into two groups A and B on the basis of gender that is group A corresponds to the male and group B corresponds to the female. We have computed $D_{K L}$ to measure the influence of the deleted variable $x_{1}$ in group A and B separately and the discrepancies are drawn in Figure 1.

Similar figure can be obtained by deleting $x_{2}$ . From this figure the discrepancy is less around the mean of the deleted variable.

Example 2: This is a simulation exercise. Here we have drawn sample of size 159 from bivariate normal distribution and we have used means, variances and correlation coefficient of $x_{1}$ and $x_{2}$ of the above flu shot data of size 159 for generating the sample. Now using these $x_{1}$ and $x_{2}$ , we got response that is Y values and thereafter using this whole generated data set we have computed $D_{K L}$ . Now we have repeated whole process 1000 times and computed means of $D_{K L} s$ . The mean discrepancies are shown in Figure 2. Here we get the same conclusion as in the data example.

Here the aim is to detect the predictive influence of a set of missing future explanatory variables in log-odds ratio of logistic model (i). Our interest is to detect the influence of missing future explanatory variables in the six cases pointed out in Section 2. Let in treatment A, r future variables missing from $x_{A}^{f}$ and s future variables missing from $x_{A B}^{f}$ be denoted by $x_{(A)}^{(r + s) f}$ . Similarly in treatment B, r future missing variables from $x_{B}^{f}$ and s future variables missing from $x_{A B}^{f}$ be denoted by $x_{(B)}^{(r + s) f}$ . We assume that the errors of models (ii) and (iii) are normally distributed with zero means and variances $τ_{(A)}^{- 1}$ and $τ_{(B)}^{- 1}$ , respectively. We also assume that the conditional density of $x^{*}_{(r)}^{f}$ given $x^{*}^{f}$ is independent of $β_{(A)}$ and $τ_{(A)}$ and $x_{(B)}^{(r + | s) f}$ given $x_{(B)}^{* f}$ is independent of $β_{(B)}$ and $τ_{(B)}$ , i.e.,

$f (x_{(.)}^{(r + s) f} | x_{(.)}^{* f}, β_{(.)}, τ_{(.)}) = f (x_{(.)}^{(r + s) f} | x_{(.)}^{* f})$

where $x_{(.)}^{* f}$ denotes the future explanatory variables $x_{(.)}^{f}$ without $x_{(.)}^{(r + s) f}$ .

Explanatory variables are continuous

We assume that $x_{i}^{f,}$ s are dependent and the distribution of $x_{(A)}^{f}$ is $(k - 1)$ -dimensional multivariate normal, i.e. $f (x_{(A)}^{f}) \equiv N_{k - 1} (η, ψ)$ .

The conditional density of $x_{(A)}^{(r + s) f}$ given $x_{(A)}^{* f}$ is given by

$f (x_{(A)}^{(r + s) f} | x_{(A)}^{* f}) \equiv N_{r + s} (η_{(r + s)}^{*}, ψ_{(r + s)}^{*})$ ,

Where

$η = (η^{*}, η_{r + s}), x_{(A)}^{f} = (x_{(A)}^{* f}, x_{(A)}^{(r + s) f}), ψ = (\begin{array}{l} ψ_{11} ψ_{12} \\ ψ_{21} ψ_{22} \end{array}), η_{r + s}^{*} = η_{r + s} + ψ_{21} ψ_{11}^{- 1} (x_{(A)}^{* f} - η^{*})$

and $ψ_{(r + s)}^{*} = ψ_{22} - ψ_{21} ψ_{11}^{- 1} ψ_{12}$ .

As earlier it is enough to consider Case 5 to see the joint influence of r missing future explanatory variables $x_{A}^{r f}$ of $x_{A}^{f}$ and s missing future explanatory variables $x_{A B}^{s f}$ of $x_{A B}^{f}$ in treatment A. The density of $u^{f}$ when $x_{(A)}^{(r + s) f}$ is missing is given by

$f (u^{f} | x_{(A)}^{* f}, β |_{(A)}, τ_{(A)}) = \int f (u^{f} | x_{(A)}^{f}, β_{(A)}, τ_{(A)}) f (x_{(A)}^{(r + s) f} | x_{(A)}^{* f}) d x_{(A)}^{(r + s) f} \equiv N (\sum_{i = 0}^{k - r - s - 1} x_{(A) i}^{f} {\hat{β}}_{(A) i} + \sum_{i = k - r - s}^{k - 1} η_{i}^{*} {\hat{β}}_{(A) i}, \sum_{i = k - r - s}^{k - 1} {\hat{β}}_{(A) i} {\hat{β}}_{(A) j} ψ_{i j}^{*} + τ_{(A)}^{- 1})$

Where $η_{i}^{*}$ is the $i$ th component of $η_{(r + s)}^{*}$ and $ψ_{i j}^{*}$ is the ${(i . j)}^{t h}$ component of $ψ_{(r + s)}^{*}$ .

See Bhattacharjee et al¹¹ in this context. Using Taylor's expansion and improper prior density for both $β_{(A)}$ and $τ_{(A)}$ , the approximate predictive density of $u^{f}$ when $x_{(A)}^{(r + s) f}$ is missing is given by

$f_{(r + s)} (u^{f} | x_{(A)}^{* f}, d a t a) \equiv N (\sum_{i = 0}^{k - r - s - 1} x_{(A) i}^{f} {\hat{β}}_{(A) i} + \sum_{i = k - r - s}^{k - 1} η_{i}^{*} {\hat{β}}_{(A) i}, \sum_{i, j = k - r - s}^{k - 1} {\hat{β}}_{(A) i} {\hat{β}}_{(A) j} ψ_{i j}^{*} + s_{(A)}^{2} γ^{*}),$

evaluated at ${\hat{β}}_{(A)}$ and $s_{(A)}^{2}$ where

$γ^{*} = (1 + \frac{1}{2} \sum_{0}^{k - 1} Q_{i j}^{*} (β_{(A)}, τ_{(A)}) C o v (β_{(A) i}, β_{(A) j}) + \frac{1}{2} Q_{τ_{(A)}}^{2} (β_{(A)}, τ_{(A)}) V a r (τ_{(A)}))$

is the multiplicative factor for the second order Taylor's approximation. If $x_{(A)}^{f}' s$ ’s are independent the corresponding approximate predictive density of $u^{f}$ is

$f_{(r + s)} (u^{f} | x_{(A)}^{* f}, d a t a) \equiv N (\sum_{i = 0}^{k - r - s - 1} x_{(A) i}^{f} {\hat{β}}_{(A) i} + \sum_{i = k - r - s}^{k - 1} η_{i} {\hat{β}}_{(A) i}, \sum_{i, j = k - r - s}^{k - 1} {\hat{β}}_{(A) i}^{2} ψ_{i}^{2} + s_{(A)}^{2} γ)$

evaluated at ${\hat{β}}_{(A)}$ and $s_{(A)}^{2}$ , where $η_{i}$ and $ψ_{i}^{2}$ are mean and variance of the ith missing variable and $γ = (1 + \frac{1}{2} \sum_{0}^{k - 1} Q_{i j} (β_{(A)}, τ_{(A)}) C o v (β_{(A) i}, β_{(A) j}) + \frac{1}{2} Q_{τ_{(A)}}^{2} (β_{(A)}, τ_{(A)}) V a r (τ_{(A)}))$ . Since no future variable is missing in $υ$ , the approximate predictive density of $υ^{f}$ is same as obtained in Section 2. Thus when $x_{(A)}^{f}$ ’s are dependent the approximate predictive density of log-odds ratio $a^{'} w^{f}$ for $x_{(A)}^{(r + s) f}$ missing is given by

$f_{(r + s)} (a^{'} w^{f} | x_{(A)}^{* f}, x_{(B)}^{f}; d a t a) \equiv γ^{*} N (ξ, ω^{2})$ (vi)

Where

$ξ = \sum_{i = 0}^{k - r - s - 1} x_{(A) i}^{f} {\hat{β}}_{(A) i} + \sum_{i = k - r - s}^{k - 1} η_{i}^{*} {\hat{β}}_{(A) i} - x_{(B)}^{f} {\hat{β}}_{(B)}$

and

$ω^{2} = (\sum_{i, j = k - r - s}^{k - 1} {\hat{β}}_{(A) i} {\hat{β}}_{(A) j} ψ_{i j}^{*} + s_{(A)}^{2}) + s_{(B)}^{2} (1 + x_{(B)}^{f} {(X_{(B)}^{'} X_{(B)})}^{- 1} x_{(B)}^{f'}) \frac{n - q}{n - q + 2}$

The Kullback-Leibler⁹ directed measure of divergence between the predictive densities (iv) when no variable is missing and the predictive density (vi) when $\begin{array}{l} D_{K L} = \int f (a^{'} w^{f} | x_{(A)}^{f}, x_{(B)}^{f}, d a t a) \log (\frac{f (a^{'} w^{f} | x_{(A)}^{f}, x_{(B)}^{f}, d a t a)}{f_{(r + s)} (a^{'} w^{f} | x_{(A)}^{* f}, x_{(B)}^{f}, d a t a)}) d a^{'} w^{f} \\ = \frac{1}{2 ω^{2}} {(θ - ξ)}^{2} + \frac{1}{2} (\frac{δ^{2}}{ω^{2}} - \log (\frac{δ^{2}}{ω^{2}}) - 1) \end{array}$
$- \frac{1}{2} \sum_{i, j = 0}^{k - 1} E (Q_{i j}^{*} (β_{(A)}, τ_{(A)}) C o v (τ_{(A) i}, τ_{(A) j})) - \frac{1}{2} E (Q_{τ (A)}^{2} (β_{(A)}, τ_{(A)}) var (τ_{(A)}))$ (vii)

If $x_{(A)}^{f}$ ’s are independent the predictive density of $a^{'} w^{f}$ when $(r + s)$ future variables are missing is same as (vi) and the corresponding Kullback-Leibler⁹ measure $D_{K L}$ is same as (vii) but replacing $η_{i}^{*}$ by $η_{i}$ in $ξ$ , ${\hat{β}}_{(A) i} {\hat{β}}_{(A) j} ψ_{i j}^{*}$ by ${\hat{β}}_{(A) i}^{2} ψ_{i}^{2}$ in $ω^{2}$ and $Q_{i j}^{*} (β_{(A)}, τ_{(A)})$ by $Q_{i j} (β_{(A)}, τ_{(A)})$ in $γ^{*}$ , where $η_{i}$ and $ψ_{i}^{2}$ are mean and variance of the ith missing variable.

Explanatory variables are dichotomous

Here we assume that all the explanatory variables are dichotomous and independent. We assume that the errors of models (ii) and (iii) are normally distributed with means zero and variances $τ_{(A)}^{- 1}$ and $τ_{(B)}^{- 1}$ respectively. To assess the influence of the missing variables in treatment A, we consider that $x_{(A) i}^{f}$ is distributed as

$\Pr (X_{(A) i}^{f} = x_{(A) i}^{f}) = θ_{(A) i}^{x_{(A) i}^{f}} {(1 - θ_{(A) i})}^{1 - x_{(A) i}^{f}}, x_{(A) i}^{f} = 0, 1, i = 1, 2, ..., k - 1$

The density of a future $u^{f}$ is

$f (u^{f} | x_{(A)}^{f}, β_{(A)}, τ_{(A)}) \equiv N (\sum_{i = 0}^{k - 1} x_{(A) i}^{f} β_{(A) i}, τ_{(A)}^{- 1}) .$

If $x_{(A)}^{(r) f}$ future variables are missing in treatment A, then the density of a future $u^{f}$ is given by

$f (u^{f} | x_{(A)}^{* f}, β_{(A)}, τ_{(A)}^{- 1}) = \sum_{x_{(A) k - r}^{f} = 0}^{1} ..... Σ_{x_{(A) k - 1}^{f} = 0}^{1} N (\sum_{i = 0}^{k - 1} x_{(A) i}^{f} β_{(A) i}, τ_{(A)}^{- 1}) \prod_{i = k - r}^{k - 1} θ_{(A) i}^{x_{(A) i}^{f}} {(1 - θ_{(A) i})}^{1 - x_{(A) i}^{f}} .$

The predictive density of $u^{f}$ when $x_{(A)}^{(r) f}$ is missing is given by

$f (u^{f} | x_{(A)}^{* f}, d a t a = \int f (u^{f} | x_{(A)}^{* f}) β_{(A)}, τ_{(A)}^{- 1}) f (β_{(A)} | d a t a) d β_{(A)}$ (viii)

which is not mathematically tractable. For vague prior densities for $β_{(A)}$ and $τ_{(A)}$ and using Taylor's expansion, the approximate predictive density of (viii) is

$\begin{array}{l} f (u^{f} | x *_{(A)}^{f}, d a t a) = \sum_{x_{(A) k - r}^{f} = 0}^{1} ... \sum_{x_{(A) k - 1}^{f} = 0}^{1} N (\sum_{i = 0}^{k - 1} x_{(A) i}^{f} {\hat{β}}_{(A) i}, s_{(A)}^{2}) \prod_{i = k - r}^{k - 1} θ_{(A) i}^{x_{(A) i}^{f}} {(1 - θ_{(A)} i)}^{1 - x_{(A) i}^{f}} \\ (1 + \sum_{i, j = 0}^{k - 1} Q_{i j} (\hat{β}, s_{(A)}^{- 2}) \frac{cov (β_{(A) i}, β_{(A) j})}{2} + Q_{τ_{(A)}^{2}} ({\hat{β}}_{(A)}, s_{(A)}^{- 2}) \frac{var (τ_{(A)})}{2}) \end{array}$

Since there are no missing variables in $ν^{f}$ , the density of $ν^{f}$ is same as that can be obtained in Section 2. Then the predictive density of $a^{'} w^{f}$ is given by

$\begin{array}{l} f (a' w^{f} | x *_{(A)}^{f}, x_{(B)}^{f}, d a t a) = \sum_{x_{(A) k - r}^{f} = 0}^{1} ... \sum_{x_{(A) k - 1}^{f} = 0}^{1} N (\sum_{i = 0}^{k - 1} (x_{(A) i}^{f} {\hat{β}}_{(A) i} - x_{(B) i}^{f} {\hat{β}}_{(B) i}), S_{(A)}^{2} + s_{(B)}^{2} (1 + x_{(B)}^{f} {(X'_{(B)} X_{(B)})}^{- 1} x'_{(B)})) \\ \prod_{i = k - r}^{k - 1} θ_{(A) i}^{x_{(A) i}^{f}} {(1 - θ_{(A)} i)}^{1 - x_{(A) i}^{f}} \\ (1 + \sum_{i, j = 0}^{k - 1} Q_{i j} ({\hat{β}}_{(A)}, s_{(A)}^{- 2}) \frac{cov (β_{(A) i}, β_{(A) j})}{2} + Q_{T_{(A)}^{2}} ({\hat{β}}_{(A)}, s_{(A)}^{- 2}) \frac{var (τ_{(A)})}{2}) \end{array}$ (ix)

Analytical solution of $D_{K L}$ between the predictive densities (iv) and (ix) is very difficult to obtain but numerical solution can be obtained. In Some situations it is seen that among the explanatory variables, some of the variables are dichotomous and some of the variables are continuous. Among the $k - 1$ -explanatory variables, without loss of generality we assume that the first $l$ are dichotomous and the remaining last $k - l - 1$ are continuous variables. We also assume that out of l dichotomous future variables last d variables are missing and out of $(k - l - 1)$ continuous future variables last g variables are missing. Then the predictive density of future log-odds ratio $a^{'} w^{f}$ when d dichotomous and g continuous variables are missing is given by

$\begin{array}{l} f (a^{'} w^{f} | x *_{(A)}^{f}, x *_{(B)}^{f}, d a t a) = (\sum_{x_{(A) l - d + 1}^{f} = 0}^{1} ... \sum_{x_{(A) l}^{f} = 0}^{1} N (\sum_{i = 0}^{k - g - 1} x_{(A) i}^{f} {\hat{β}}_{(A) i} + \\ \prod_{i = k - g}^{k - 1} η i {\hat{β}}_{(A) i} - Σ_{i = 0}^{k - 1} x_{(B) i}^{f} {\hat{β}}_{(B) i}, Σ_{i = k - g}^{k - 1} {\hat{β}}_{(A) i}^{2} Ψ_{i}^{2} + S_{(A)}^{2} + S_{(B)}^{2} (1 + x_{(B)}^{f} {(X_{(B)}^{^{'}} X_{(B)})}^{- 1} x_{(B)}^{^{'}})) . (x) \\ Π_{i = l - d + 1}^{l} θ_{i}^{x_{(A) i}^{f}} {(1 - θ_{i})}^{1 - x_{(A) i}^{f}}) (1 + \sum_{i, j = 0}^{k - 1} Q_{i j} ({\hat{β}}_{(A)}, s_{(A)}^{- 2}) \frac{cov (β_{(A) i}, β_{(A) j})}{2} \\ + Q_{T_{(A)}^{2}} ({\hat{β}}_{(A)}, s_{(A)}^{- 2}) \frac{var (T_{(A)})}{2}) \end{array}$ (x)

Again, analytical solution of $D_{K L}$ between the predictive densities (iv) and (x) is very difficult but we can obtain its numerical solution. In similar way we can derive the predictive density of future log-odds ratio when some future variables are missing in treatment B.

Example 1 revisited: This example is based on the flu shot data of Example 1. From Figure 3 we have observed same as Examples 1 and 2 that the discrepancies are less around the mean of the missing variables. Moreover we have observed from Figures 1 and 3 that the discrepancies of the missing variables are less as compared to the discrepancies of the deleted variables.

Example 2 revisited: This example is based on the simulation data of Example 2 and here we have also got same conclusion as Example 1 revisited (Figures 2 & 4).

Group A Group B

Figure 1 Three dimensional scatter plots based on real data for D_KL

when x1 is deleted.

Group A Group B

Figure 2 Three dimensional scatter plots based on simulated data for D_KL

when x1 is deleted.

Group A Group B

Figure 3 Three dimensional scatter plots based on real data for D_KL

when x_f¹ is missing.

Group A Group B

Figure 4 Three dimensional scatter plots based on simulated data for D_KL

when x_f¹ is missing.

Examples 1 and 2 revisited: In this example, we have used $D_{K L}$ values for real data for drawing box plots for each cases (deleted and missing). From Figure 5, we have observed that x₂ is more in uential than x₁. Moreover the discrepancies are much less in missing case than deleted case. We have got same result in simulation study and are illustrated in Figure 6.

Treatment A Treatment B

Figure 5 Box plot for D_KL based on real data.

Treatment A Treatment B

Figure 6 Box plot for D_KL based on simulated data.

We consider the logistic model as

$\Pr (y = 1 | x, β) = \exp (x β) / (1 + \exp (x β))$

The probability that a future response yf will be a success is given by

$\Pr (y^{f} = 1 | x^{f}, β) = \exp (x^{f} β) / (1 + \exp (x^{f} β))$

We assume that the conditional density of xf(r) given xf is independent of, $β$ where xf denotes the future explanatory variables without variables xf(r). Then predictive probabilities of yf will be a success for models are given by

$\Pr (y^{f} = 1 | x^{f}, d a t a) = \int \Pr (y^{f} = 1 | x^{f}, β) f (β | d a t a) d β$

and

$\Pr (y^{f} = 1 | x^{*}^{f}, d a t a) = \int \Pr (y^{f} = 1 | x^{* f}, β) f (β | d a t a) d β$ respectively. Simple analytically tractable priors are not available here. Numerical integration techniques might be used for some specified priors to approximate $\Pr (y^{f} = 1 | x^{f}, d a t a)$ and $\Pr (y^{f} = 1 | x *^{f}, d a t a)$ , respectively.

Normal approximation for the posterior density

Let us suppose that the sample size is large. Lindley¹² stated that the posterior density $f (β | d a t a)$ may then be well approximated by its asymptotic normal form as

$f (β | d a t a) \approx N_{p} (\hat{β}, \sum)$

where $\hat{β}$ is the maximum likelihood estimate of β, ∑ = (-H)-1 and H is the Hessian of log L(β) evaluated at .

For the logistic model (xi), the Hessian H=(hji( $\hat{β}$ )) evaluated at is given by

$h_{j l} (\hat{β}) = - \sum_{i = 1}^{n} \frac{x_{i j} x_{i l} \exp (x_{i} \hat{β})}{{(1 + \exp (x_{i} \hat{β}))}^{2}}, j, l = 0, 1, ..., k,$

Where x_ij is the jth component of $x_{i}$ with $x_{i 0}$ = 1. For given $x^{f}, z = x^{f} β$ will have approximately a posteriori a normal distribution with mean $b x^{f}$ = $x^{f} \overset{\land}{β}$ and variance $d_{x^{f}}^{2} = x^{f} Σ x^{f'}$ , and with probability density function $ϕ (z | b_{x^{f}}, d_{^{x^{f}}}^{2})$ . Using the transformation we can approximate $f (β | x^{f}, d a t a)$ by

$\Pr (y^{f} = 1 | x^{f}, d a t a) \approx \int \frac{\exp (z)}{1 + \exp (z)} ϕ (z | b_{_{x^{f}}, d_{x^{f}}^{2}}) d z .$

Analytical evaluation of (4.1) is very di cult. We can however evaluate then by numerical integration techniques viz Gauss-Hermite Quadrature Abramowitz and Stegun,¹³ Normal approximation Cox,¹⁴ Laplace's approximation de Bruijn.¹⁵

If the sample size is small, the posterior normality assumption may not be accurate. Therefore, we consider Flat prior approximation Tierney and Kadane¹⁶ as an alternative approach using the Laplace's method for integrals.

Effect of the variables $x^{f}$

Here we assume that the future variables $x^{f}$ are dependent and the density of $x^{f}$ is p-dimensional multivariate normal i.e.

$f (x^{f}) \equiv N_{p} (n, ψ)$

The conditional density of $x_{(r)}^{f}$ for given $x^{* f}$ is

$f (x_{(r)}^{f} | x *^{f}) \equiv N_{r} (n *_{(r)}, ψ *_{(r)})$

The probability of $y^{f}$ as a success when $x_{(r)}^{f}$ is missing given by

$\Pr (y^{f} = 1 | x^{* f}, β) = \int \frac{\exp (x^{f} β)}{1 + \exp (x^{f} β)} f (x_{(r)}^{f} {| x}^{* f}) d x_{(r)}^{f}$

$\approx ϕ ((\sum_{i = 0}^{k - r} x_{i}^{f} β_{i} + \sum_{i = k - r + 1}^{k} n_{i}^{*} β_{i}) / {(k^{2} + \sum_{i j = k - r + 1}^{k} β_{i} β_{j} Ψ_{})}^{1 / 2})$

$= g * (β)$ (Say)

Then the predictive probability of $y^{f}$ as a success when $x_{(r)}^{f}$ is missing given by

$p r (y^{f} = 1 | x^{* f}, d a t a) = \int g^{*} (β) f (β | d a t a) d β .$ (xii)

The integral in (Xii) can be evaluated as the integral in (Xi) using Taylor's and Laplace's approximations.

If, instead, the future variables $x_{1}^{f}$ ,…, $x_{k}^{f}$ are independently and normally distributed with mean $η_{i}$ and variance (i = 1, 2, … , k), then the conditional density of $x_{(r)}^{f}$ is

$f (x_{(r)}^{f} | x^{* f}) \equiv f (x_{(r)}^{f})$ .

Consequently, we get

$\Pr (y^{f} = 1 | x^{* f}, β) = \int \frac{\exp (x^{f} β)}{1 + \exp (x^{f} β)} f (x_{(r)}^{f}) d x_{(r)}^{f}$

$\approx ϕ ((\sum_{i = 0}^{k - r} x_{i}^{f} β_{i} + \sum_{i = k - r + 1}^{k} n_{i} β_{i}) / {(k^{2} + \sum_{i = k - r + 1}^{k} β_{i}^{2} Ψ_{i}^{2})}^{1 / 2})$

$= g (β)$ (Say)

See Aitchison and Begg¹⁷ in this context. Again,

$\Pr (y^{f} = 1 | x^{f}, d a t a) = \int g (β) f (β | d a t a) d β$

Variables $x^{f}$ are dichotomous

Here we assume that the variables $x^{f}$ are independent and they can take only two values 0 or 1. We also assume that $x_{i}^{f}$ is distributed as

$\Pr (x_{i}^{f} = x_{i}^{f}) = θ_{i}^{x_{i}^{f}} {(1 - θ_{i})}^{1 - x_{i}^{f}}$

If $x_{(r)}^{f}$ is missing the probability of $y^{f}$ as a success is given by

$\Pr (y^{f} = 1 | x *^{f}, β) = \sum_{x_{k - r + 1}^{f} = 0}^{1} ... \sum_{x_{k_{^{= 0}}}^{f}}^{1} \frac{\exp (x^{f} β)}{1 + \exp (x^{f} β)} \prod_{i = k - r + 1}^{k} θ_{i}^{x_{i}^{f}} {(1 - θ_{i})}^{1 - x_{i}^{f}} = h (β)$ (Say).

The predictive probability of $y^{f}$ as a success when $x_{(r)}^{f}$ is missing is given by

$\Pr (y^{f} = 1 | x *^{f}, d a t a) = \int h (β) f (β | d a t a) d β .$ (xiii)

If the sample size is large, assuming the normality assumption for the posterior density we can approximate (xiii) using Taylor's theorem, Laplace's method and normal approximation.

Example: one variable case

Here we consider two different logistic models based on any single variable either $x_{1}$ or $x_{2}$ . We want to measure the discrepancies between the predictive probability ${\hat{p}}_{i}$ , based on a single variable $x_{i}$ when $x_{i}^{f}$ is known, and the predictive probability ${\hat{p}}_{0}$ , based on xi alone when $x_{2}^{f}$ is missing, to assess the influence of the missing variable $x_{i}^{f}$ , i = 1, 2. The predictive probability ${\hat{p}}_{i}$ is determined using quadrature approximation and the predictive probability ${\hat{p}}_{0}$ is determined using second order Taylor's approximation.

We assume that the marginal densities of the future variables $x_{1}^{f}$ and $x_{2}^{f}$ are normal with means 33.35, 78.24 and variances 65.39, 1827.0 respectively, where means and variances are the estimated sample means and sample variances from the observed data. We employ the absolute difference of probabilities and Kullback-Leibler divergence measure to assess the influence of the missing variable. The discrepancies are drawn in Figure 7. Here we see that the discrepancies due to missing $x_{1}^{f}$ in the predictive probability based on $x_{1}$ are very large compared to the discrepancies due to missing $x_{2}^{f}$ in the predictive probability based on $x_{2}$ . The discrepancies are less around the mean of the missing variable.

x₁ ^fis missing x₂^f is missing

Kullback-Leibler directed divergence D_{KL}

x₁ ^fis missing x₂^f is missing

Figure 7 Absolute difference

$| \overset{\land}{P_{i}} - \overset{\land}{P_{0}} |, i = 1, 2$

Example: two-variable case

Now we consider that the predictive probability based on two variables $x_{1}^{f}$ and $x_{2}^{f}$ when both $x_{1}^{f}$ and $x_{2}^{f}$ are known is denoted by ${\hat{p}}_{12}$ and the predictive probability ${\hat{p}}_{i j}$ , $i = 0, 1$ , $j = 0, 2$ and $(i, j) \neq (1, 2)$ based on $x_{1}$ and $x_{2}$ when any future variable is missing. "0" indicates missing variable. Here also the predictive probability ${\hat{p}}_{12}$ is determined using quadrature approximation and predictive probabilities ${\hat{p}}_{10}$ , ${\hat{p}}_{02}$ and ${\hat{p}}_{00}$ are determined using second order Taylor's approximation. Here we assume that the joint density of $x_{1}^{f}$ and $x_{2}^{f}$ is bivariate normal with correlation coefficient 0.33 which is the estimated sample correlation coefficient from the observed data. The absolute differences of the two predictive probabilities ${\hat{p}}_{12}$ and ${\hat{p}}_{02}$ when $x_{1}^{f}$ is missing and the absolute differences of the two predictive probabilities ${\hat{p}}_{12}$ and ${\hat{p}}_{10}$ when $x_{2}^{f}$ is missing are drawn in Figure 8. Kullback-Leibler directed divergence D_KL are drawn in Figure 9. The discrepancies when $x_{1}^{f}$ is missing and for different given values of the other variable for both the cases are close together since the correlation between $x_{1}^{f}$ and $x_{2}^{f}$ are very small. The discrepancies due to missing $x_{1}^{f}$ are very large compared to missing $x_{2}^{f}$ except near the mean of the missing variable. If both $x_{1}^{f}$ and $x_{2}^{f}$ are missing the discrepancies are drawn in Figure 10. These discrepancies are very similar to the discrepancies due to missing $x_{1}^{f}$ alone in the predictive probability based on $x_{1}$ and $x_{2}$ since the contribution of $x_{2}$ is negligible.

x₁^f is missing

x_f² is missing

Figure 8 Absolute difference

$| {\hat{p}}_{12} - {\hat{p}}_{10} |$

x_f¹ is missing

Kullback-Leibler directed divergence D_KL

x_f² is missing

Figure 9 Kullback-Leibler directed divergence D_KL

Absolute difference $| \overset{\land}{P_{12}} - \overset{\land}{P_{00}} |, i = 1, 2$ .

Kullback-Leibler directed divergence D_KL

Figure 10 X₁^f and X₂^ffare both missing.

In our present study we have observed that the discrepancies are minimum around the mean of the deleted variables as well as the mean of the missing future variables in both the logistic model and the log-odds ratio; the discrepancies are larger if the deleted or missing variables are more influential; the discrepancies in the deleted case are higher than the missing case.

In this present paper we studied the important problem of predictive influence of variables on the log odds ratio under a Bayesian set up. The treatment difference

$\Pr (Y_{i} = 1 | Z_{i} = 1, x_{i}) - \Pr (Y_{i} = 1 | Z_{i} = 0, x_{i})$

Or the risk of ratio

$\Pr (Y_{i} = 1 | Z_{i} = 1, x_{i}) / \Pr (Y_{i} = 1 | Z_{i} = 1, x_{i})$

can also be studied along the same lines.

We have also considered the influence of missing future explanatory variables in a logistic model. Influence of missing future explanatory variables in a Probit and complementary log-log models can also be studied in similar fashion.

None.

Breslow N. Odds ratio estimators when the data are sparse. Biometrika. 1981;68:73–84.
Bohning D, Kuhnert, Rattanasiri S, et al. Meta–analysis of binary data using pro le likelihood. 1st ed. A Chapman and Hall CRC Interdisciplinary Statistic. 2008.
Pregibon D. Logistic regression diagnostics. Annals of Statistics. 1981;9:705–724.
Cook, R Dennis, Weisberg, et al. Residuals and Influence in Regression. USA: New York: Chapman and Hall. 1982.
Johnson W. Influence measures for logistic regression: Another point of view. Biometrika. 1985;72(1):59–65.
Bhattacharjee SK, Dunsmore IR. The influence of variables in a logistic model. Biometrika. 1991;78(4):851–856.
Mercier C, Shelley MC, Rimkus J, et al. Age and Gender as Predictors of Injury Severity in Head–on Highway Vehicular Collisions. The 76th Annual Meeting , Transportation Research Board, Washington, USA. 1997.
Zellner D, Keller F, Zellner GE. Variable selection in logistic regression models. Communications in Statistics. 2004;33(3):787–805.
Kullback S, Leibler R A. On information and sufficiency. Ann Math Statist. 1951;22:79–86.
S Rao Jammalamadaka, Tiwari RC, Chib Siddhartha. Bayes prediction in the linear model with spherically symmetric errors. Economics Letters. 1987;24:39–44.
Bhattacharjee SK, Shamiri A, Sabiruzzaman Md, et al. Predictive Influence of Unavailable Values of Future Explanatory Variables in a Linear Model. Communications in Statistics – Theory and Methods. 2011;40:4458–4466.
Lindley D V. The use of prior probability distributions in statistical inference and decisions. Proc. 4th Berkeley Symp. 1961;1:453–468.
Abramowitz M, Stegun I A . Handbook of Mathematical Functions. USA: National Bureau of Standards. 1966.
Cox DR. Binary regression. UK: London: Chapman and Hall. 1970.
De Bruijn N C. Asymptotic Methods in Analysis. Amsterdam, North–Holland. 1961.
Tierney L, Kadane, Joseph B, et al. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association. 1986;81(393):82–86.
Aitchison J, Bagg CB (1976) Statistical diagnosis when basic cases are not classified with certainty. Biometrika 63: 1-12.
Logistic Regression Example with Grouped Data. Regression FluShots, University of North Florida.
Bhattacharjee SK, Dunsmore IR. The predictive influence of variables in a normal regression model. J Inform Optimiz Sci. 1995;16(2):327–334.

Submit manuscript...

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Predictive influence of variables on the odds ratio and in the logistic model

S K Bhattacharjee,¹ Atanu Biswas,² Ganesh Dutta,³ S Rao Jammalamadaka,⁴

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

M Masoom Ali⁵

Abstract

Introduction

Influence of variables in log-odds ratio

Influence of missing future explanatory variables in log-odds ratio

Evaluation of predictive probability of a logistic model

Effect of the variables $x^{f}$

Concluding remarks

Acknowledgments

Conflicts of interest

References

Citations

Rejected Articles

Journal Menu

Useful Links