Bayesian adjustment for misclassification in mortality data

doi:10.15406/bbij.2014.01.00010

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Editorial Volume 1 Issue 2

Bayesian adjustment for misclassification in mortality data

Mohamad Amin Pourhoseingholi

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Gastroenterology and Liver diseases Research Center, Shahid Beheshti University of Medical Sciences, Iran

Correspondence: Mohamad Amin Pourhoseingholi, Gastroenterology and Liver diseases Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran, Tel 982122432515, Fax 982122432517

Received: September 28, 2014 | Published: November 12, 2014

Citation: Pourhoseingholi MA. Bayesian adjustment for misclassification in mortality data. Biom Biostat Int J. 2014;1(2):44-45. DOI: 10.15406/bbij.2014.01.00010

Download PDF

Editorial

In medical studies, a difficulty in drawing inference from categorical data is the existence of misclassification. Misclassification is the disagreement between the observed and the true value. Sick individuals may be diagnosed as healthy or the causes of diseases or death may be misjudged. There is an effect of misclassification on estimation and hypothesis testing, often leading to biased estimates, and can therefore cause one to underestimate health risks.¹ The effect of misclassification was first noted by Bross² and in statistical literature, two approaches are recommended, the first; using a small validation sample³ and the second; Bayesian analysis in which subjective prior information on at least some subset of the parameters is used to re-estimate misclassified statistic.^4–⁶

The difficulty of first approach is that without the presence of additional information beyond the correct data, it is not possible to take into account the effect of misclassification and the difficulty with re-sampling is the necessity of an infallible classifier, which may not exist or may be expensive.⁷ On the other hand, the Bayesian literature on this topic is steadily growing. Most importantly, more complex model could be handled due to development of computational techniques (e.g., Monte Carlo methods).

Among medical indexes, mortality is a familiar projection in the assessment of the burden of diseases. But this aim needs reliable death registry systems which reports death statistics, annually and accurately. Besides, the analysis of death statistics subject to misclassification is a major problem in epidemiological analysis.¹ Although the World Health Organization (WHO) has encouraged member states to introduce systems of death registration involving medical certification of the cause of death, the misclassification or underestimation of mortality data is still happened in official statistics, most of them in developing countries.

Bayesian approach received much attention in the case of misclassification for mortality data. Whittemore and Gong incorporated supplemental data on both true and fallible disease and used this approach to estimate cervical cancer mortality rates in Poisson regression⁴ and Sposto et al.⁵ developed this likelihood to assess the effect of diagnostic misclassification on non-cancer and cancer mortality dose–response. Stamey et al.¹ provided a Bayesian approach, which extends the models of Whittemore & Gong⁴ and Sposto et al.⁵ But their technique dose not assumes that the misclassification parameters are known. Also, the prior information on the misclassification parameters would be used instead of validation data.² They used this Bayesian approach in data consisting of the number of deaths due to cancer and non-cancer among residents of Hiroshima and Nagasaki, Japan. We derived an extension of models proposed by Stamey et al.¹ to correct and account for misclassification in cancer mortality data.⁸ Suppose there are two sample groups for death classification; $y_{1} = {[y_{11}, y_{21}, ..., y_{r 1}]}^{'}$ and $y_{2} = {[y_{12}, y_{22}, ..., y_{r 2}]}^{'}$ where r is the covariate pattern, y₁ is the exact cause of death and y₂ is the misclassified group in which the cause of death in the first group was incorrectly labeled, and $y_{1} ~ P o i s s o n (P_{i} μ_{i 1})$ and $y_{2} ~ P o i s s o n (P_{i} μ_{i 2})$ in which μ_i is the observed rate of death mortality for the covariate pattern in Stamey et al.¹ approach, there is a possibility of two way incorrectly labeled but in our approach, just one group supposed to be misclassified because of the nature of real data). Let θ be the probability that an observation from group 1 is incorrectly labeled in group 2. If the actual rate of death for each group (unknown) is supposed to be as λ_i, the relation between actual rate and observed rate can be written in following form; $μ_{i 1} = λ_{i 1} (1 - θ)$ and $μ_{i 2} = λ_{i 2} + λ_{i 1} θ$ .

The joint distribution of the observable mortality data in this case of misclassification is proportional to; $\prod_{i = 1}^{r} [λ_{i 1} {(1 - θ)}^{y_{i 1}}] {[λ_{i 2} + λ_{i 1} θ]}^{y_{i 2}} \exp {- P_{i} [λ_{i 1} (1 - θ)] - P_{i} [λ_{i 2} + λ_{i 1} θ]}$

To perform Bayesian inference, one can assume beta prior distribution for the misclassified parameter, i.e. $θ ~ b e t a (a, b)$

Because θ is an unknown parameter, we employed a latent variable approach according to Paulino et al.,^9,¹⁰ Liu et al.¹¹ and Stamey et al.¹ to simplify the full conditional models and estimate the posterior distribution using a Gibbs sampling algorithm. In this case, we define $U_{i} | β_{1}, β_{2}, θ, y_{1}, y_{2} ~ B i n o m i a l (y_{i 2}, P_{i})$ to be the number of counts from the first group incorrectly labeled as being in the misclassified group. So; $P_{i} = \frac{λ_{i 1} θ}{λ_{i 1} θ + λ_{i 2}}$ and finally the posterior appears in the following form; $θ | β_{1}, β_{2}, U_{i 1}, y_{1}, y_{2} ~ b e t a (\sum_{i} U_{i 1} + a, \sum_{i} y_{i 2} + b)$

This approach was employed to correct the misclassification in cancer mortality data.^12,¹³ In the absence of double sampling or valid data, Bayesian approach would be a good alternative to eliminate the effects of misclassification for mortality data, typically for death statistics of developing countries, which data are subject to misclassification or under-reporting. Bayesian technique is flexible and easily handled by computational analysis.