eISSN: 2378-315X BBIJ

Biometrics & Biostatistics International Journal
Volume 1 Issue 2 - 2014
Bayesian Adjustment for Misclassification in Mortality Data
Mohamad Amin Pourhoseingholi*
Gastroenterology and Liver diseases Research Center, Shahid Beheshti University of Medical Sciences, Iran
Received: September 28, 2014 | Published: November 11, 2014
*Corresponding author: Mohamad Amin Pourhoseingholi, Gastroenterology and Liver diseases Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran, Tel: 982122432515; Fax: 982122432517; Email: @
Citation: Pourhoseingholi MA (2014) Bayesian Adjustment for Misclassification in Mortality Data. Biom Biostat Int J 1(2): 00010. DOI: 10.15406/bbij.2014.01.00010


In medical studies, a difficulty in drawing inference from categorical data is the existence of misclassification. Misclassification is the disagreement between the observed and the true value. Sick individuals may be diagnosed as healthy or the causes of diseases or death may be misjudged. There is an effect of misclassification on estimation and hypothesis testing, often leading to biased estimates, and can therefore cause one to underestimate health risks [1]. The effect of misclassification was first noted by Bross [2] and in statistical literature, two approaches are recommended, the first; using a small validation sample [3] and the second; Bayesian analysis in which subjective prior information on at least some subset of the parameters is used to re-estimate misclassified statistic [4-6].
The difficulty of first approach is that without the presence of additional information beyond the correct data, it is not possible to take into account the effect of misclassification and the difficulty with re-sampling is the necessity of an infallible classifier, which may not exist or may be expensive [7]. On the other hand, the Bayesian literature on this topic is steadily growing. Most importantly, more complex model could be handled due to development of computational techniques (e.g., Monte Carlo methods).
Among medical indexes, mortality is a familiar projection in the assessment of the burden of diseases. But this aim needs reliable death registry systems which reports death statistics, annually and accurately. Besides, the analysis of death statistics subject to misclassification is a major problem in epidemiological analysis [1]. Although the World Health Organization (WHO) has encouraged member states to introduce systems of death registration involving medical certification of the cause of death, the misclassification or underestimation of mortality data is still happened in official statistics, most of them in developing countries.
Bayesian approach received much attention in the case of misclassification for mortality data. Whittemore and Gong incorporated supplemental data on both true and fallible disease and used this approach to estimate cervical cancer mortality rates in Poisson regression [4] and Sposto et al. [5] developed this likelihood to assess the effect of diagnostic misclassification on non-cancer and cancer mortality dose–response. Stamey et al. [1] provided a Bayesian approach, which extends the models of Whittemore & Gong [4] and Sposto et al. [5]. But their technique dose not assumes that the misclassification parameters are known. Also, the prior information on the misclassification parameters would be used instead of validation data (2). They used this Bayesian approach in data consisting of the number of deaths due to cancer and non-cancer among residents of Hiroshima and Nagasaki, Japan. We derived an extension of models proposed by Stamey et al. [1] to correct and account for misclassification in cancer mortality data [8]. Suppose there are two sample groups for death classification; y 1 = [ y 11 ,  y 21 , ... y r1 ] MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaaIXaaabeaakiabg2da9maadmaabaGaamyEamaaBaaaleaacaaIXaGaaGymaaqabaGccaGGSaGaaeiiaiaadMhadaWgaaWcbaGaaGOmaiaaigdaaeqaaOGaaiilaiaabccacaqGUaGaaeOlaiaab6cacaqGSaGaaeiiaiaadMhadaWgaaWcbaGaamOCaiaaigdaaeqaaaGccaGLBbGaayzxaaWaaWbaaSqabeaakiabgkdiIcaaaaa@4CE9@ and y 2 = [ y 12 ,  y 22 , ... y r2 ] MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaaIYaaabeaakiabg2da9maadmaabaGaamyEamaaBaaaleaacaaIXaGaaGOmaaqabaGccaGGSaGaaeiiaiaadMhadaWgaaWcbaGaaGOmaiaaikdaaeqaaOGaaiilaiaabccacaqGUaGaaeOlaiaab6cacaqGSaGaaeiiaiaadMhadaWgaaWcbaGaamOCaiaaikdaaeqaaaGccaGLBbGaayzxaaWaaWbaaSqabeaakiabgkdiIcaaaaa@4CED@ where r is the covariate pattern, y1 is the exact cause of death and y2 is the misclassified group in which the cause of death in the first group was incorrectly labeled, and y 1 ~Poisson ( P i μ i1 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaaIXaaabeaakiaac6hacaWGqbGaam4BaiaadMgacaWGZbGaam4Caiaad+gacaWGUbGaaeiiaiaacIcacaWGqbWaaSbaaSqaaiaadMgaaeqaaOGaeqiVd02aaSbaaSqaaiaadMgacaaIXaaabeaakiaacMcaaaa@490E@ and y 2 ~Poisson ( P i μ i2 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaaIYaaabeaakiaac6hacaWGqbGaam4BaiaadMgacaWGZbGaam4Caiaad+gacaWGUbGaaeiiaiaacIcacaWGqbWaaSbaaSqaaiaadMgaaeqaaOGaeqiVd02aaSbaaSqaaiaadMgacaaIYaaabeaakiaacMcaaaa@4910@ in which μi is the observed rate of death mortality for the covariate pattern in Stamey et al. [1] approach, there is a possibility of two way incorrectly labeled but in our approach, just one group supposed to be misclassified because of the nature of real data). Let θ be the probability that an observation from group 1 is incorrectly labeled in group 2. If the actual rate of death for each group (unknown) is supposed to be as λi, the relation between actual rate and observed rate can be written in following form; μ i1 = λ i1 (1θ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd02aaSbaaSqaaiaadMgacaaIXaaabeaakiabg2da9iabeU7aSnaaBaaaleaacaWGPbGaaGymaaqabaGccaGGOaGaaGymaiabgkHiTiabeI7aXjaacMcaaaa@44EA@ and μ i2 = λ i2 + λ i1 θ MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd02aaSbaaSqaaiaadMgacaaIYaaabeaakiabg2da9iabeU7aSnaaBaaaleaacaWGPbGaaGOmaaqabaGccqGHRaWkcqaH7oaBdaWgaaWcbaGaamyAaiaaigdaaeqaaOGaeqiUdehaaa@4660@ . The joint distribution of the observable mortality data in this case of misclassification is proportional to; i=1 r [ λ i1 (1θ) y i1 ] [ λ i2 + λ i1 θ] y i2 exp{ P i [ λ i1 (1θ)] P i [ λ i2 + λ i1 θ]} MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaebCaeaacaGGBbaaleaacaWGPbGaeyypa0JaaGymaaqaaiaadkhaa0Gaey4dIunakiabeU7aSnaaBaaaleaacaWGPbGaaGymaaqabaGccaGGOaGaaGymaiabgkHiTiabeI7aXjaacMcadaahaaWcbeqaaiaadMhadaWgaaadbaGaamyAaiaaigdaaeqaaaaakiaac2facaGGBbGaeq4UdW2aaSbaaSqaaiaadMgacaaIYaaabeaakiabgUcaRiabeU7aSnaaBaaaleaacaWGPbGaaGymaaqabaGccqaH4oqCcaGGDbWaaWbaaSqabeaacaWG5bWaaSbaaWqaaiaadMgacaaIYaaabeaaaaGcciGGLbGaaiiEaiaacchacaGG7bGaeyOeI0IaamiuamaaBaaaleaacaWGPbaabeaakiaacUfacqaH7oaBdaWgaaWcbaGaamyAaiaaigdaaeqaaOGaaiikaiaaigdacqGHsislcqaH4oqCcaGGPaGaaiyxaiabgkHiTiaadcfadaWgaaWcbaGaamyAaaqabaGccaGGBbGaeq4UdW2aaSbaaSqaaiaadMgacaaIYaaabeaakiabgUcaRiabeU7aSnaaBaaaleaacaWGPbGaaGymaaqabaGccqaH4oqCcaGGDbGaaiyFaaaa@79B6@ . To perform Bayesian inference, one can assume beta prior distribution for the misclassified parameter, i.e. θ~beta (a,b) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiUdeNaaiOFaiaadkgacaWGLbGaamiDaiaadggacaqGGaGaaiikaiaadggacaGGSaGaamOyaiaacMcaaaa@42E6@ . Because θ is an unknown parameter, we employed a latent variable approach according to Paulino et al. [9,10], Liu et al. [11] and Stamey et al. [1] to simplify the full conditional models and estimate the posterior distribution using a Gibbs sampling algorithm.
In this case, we define U i | β 1 , β 2 ,θ, y 1 , y 2 ~Binomial ( y i2 , P i ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyvamaaBaaaleaacaWGPbaabeaakmaaeeaabaGaeqOSdi2aaSbaaSqaaiaaigdaaeqaaOGaaiilaiabek7aInaaBaaaleaacaaIYaaabeaakiaacYcacqaH4oqCcaGGSaGaamyEamaaBaaaleaacaaIXaaabeaakiaacYcacaWG5bWaaSbaaSqaaiaaikdaaeqaaaGccaGLhWoacaGG+bGaamOqaiaadMgacaWGUbGaam4Baiaad2gacaWGPbGaamyyaiaadYgacaqGGaGaaiikaiaadMhadaWgaaWcbaGaamyAaiaaikdaaeqaaOGaaiilaiaadcfadaWgaaWcbaGaamyAaaqabaGccaGGPaaaaa@58E9@ to be the number of counts from the first group incorrectly labeled as being in the misclassified group. So; P i = λ i1 θ λ i1 θ+ λ i2 MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaaBaaaleaacaWGPbaabeaakiabg2da9maalaaabaGaeq4UdW2aaSbaaSqaaiaadMgacaaIXaaabeaakiabeI7aXbqaaiabeU7aSnaaBaaaleaacaWGPbGaaGymaaqabaGccqaH4oqCcqGHRaWkcqaH7oaBdaWgaaWcbaGaamyAaiaaikdaaeqaaaaaaaa@4A12@ and finally the posterior appears in the following form; θ| β 1 , β 2 , U i1 , y 1 , y 2 ~beta ( i U i1 + a, i y i2 +b) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiUde3aaqqaaeaacqaHYoGydaWgaaWcbaGaaGymaaqabaGccaGGSaGaeqOSdi2aaSbaaSqaaiaaikdaaeqaaOGaaiilaiaadwfadaWgaaWcbaGaamyAaiaaigdaaeqaaOGaaiilaiaadMhadaWgaaWcbaGaaGymaaqabaGccaGGSaGaamyEamaaBaaaleaacaaIYaaabeaaaOGaay5bSdGaaiOFaiaadkgacaWGLbGaamiDaiaadggacaqGGaGaaiikamaaqafabaGaamyvamaaBaaaleaacaWGPbGaaGymaaqabaGccqGHRaWkaSqaaiaadMgaaeqaniabggHiLdGccaWGHbGaaiilamaaqafabaGaamyEamaaBaaaleaacaWGPbGaaGOmaaqabaGccqGHRaWkcaWGIbGaaiykaaWcbaGaamyAaaqab0GaeyyeIuoaaaa@607E@ . This approach was employed to correct the misclassification in cancer mortality data [12,13]. In the absence of double sampling or valid data, Bayesian approach would be a good alternative to eliminate the effects of misclassification for mortality data, typically for death statistics of developing countries, which data are subject to misclassification or under-reporting. Bayesian technique is flexible and easily handled by computational analysis.


  1. Stamey JD, Young DM, Seaman Jr JW. A Bayesian approach to adjust for diagnostic misclassification between two mortality causes in Poisson regression. Statist Med 27(13): 2440-2452.
  2. Bross I (1954) Misclassification in 2×2 tables. Biometrics 10(4): 478-486.
  3. Lyles RH (2002) A note on estimating crude odds ratios in case–control studies with differentially misclassified exposure. Biometrics 58(4): 1034-1036.
  4. Whittemore AS, Gong G (1991) Poisson regression with misclassified counts: application to cervical cancer. J R Stat Soc Ser C Appl Stat 40(1): 81-93.
  5. Sposto R, Preston DL, Shimizu Y, Mabuchi K (1992) The effect of diagnostic misclassification on non-cancer and cancer mortality dose–response in A-bomb survivors. Biometrics 48(2): 605-617.
  6. McInturff P, Johnson W, Cowling D, Gardner I (2004) Modeling risk when binary outcomes are subject to error. Stat Med 23(7): 1095-1109.
  7. Swartz T, Haitovsky Y, Vexler A, Yang T (2004) Bayesian identifiability and misclassification in multinomial data. The Canadian Journal of Statistics 32(3):1-18.
  8. Pourhoseingholi MA, Faghihzadeh S, Hajizadeh E, Abadi A, Zali MR (2009) Bayesian estimation of colorectal cancer mortality in the presence of misclassification in Iran. Asian Pac J Cancer Prev 10(4): 691-694.
  9. Paulino C, Soares P, Neuhaus J (2003) Binomial regression with misclassification. Biometrics 59(3): 670-675.
  10. Paulino CD, Silva G, Achcar J (2005) Bayesian analysis of correlated misclassified binary data. Computational Statistics & Data Analysis 49(4): 1120-1131.
  11. Liu Y, Johnson WO, Gold EB, Lasley BL (2004) Bayesian analysis of risk factors for an ovulation. Statistics in Medicine 23(12): 1901-1919.
  12. Pourhoseingholi MA, Faghihzadeh S, Hajizadeh E, Abadi A (2010) Bayesian Analysis of Gastric Cancer mortality in Iranian Population. Gastroenterol Hepatol Bed Bench 3: 15-18.
  13. Pourhoseingholi MA, Abadi A, Faghihzadeh S, et al. (2010) Bayesian analysis of esophageal cancer mortality in the presence of misclassification. Ital J Public Health 8: 342-347.
© 2019 MedCrave Publishing, All rights reserved. No part of this content may be reproduced or transmitted in any form or by any means as per the standard guidelines of fair use.
Creative Commons License Open Access by MedCrave Publishing is licensed under a Creative Commons Attribution 4.0 International License.
Based on a work at https://medcraveonline.com
Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version | Opera |Privacy Policy