Advantages and advancements of multiple imputation

doi:10.15406/bbij.2015.02.00033

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Mini Review Volume 2 Issue 3

Advantages and advancements of multiple imputation

Irene B Helenowski

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Department of Preventive Medicine, Northwestern University, USA

Correspondence: Irene B Helenowski, Department of Preventive Medicine, Northwestern University, 680 N. Lake Shore Drive, Suite 1400, Chicago, IL 60611, USA, Tel (312) 503-3597, Fax (312) 908-9588

Received: April 14, 2015 | Published: April 16, 2015

Citation: Helenowski IB. Advantages and advancements of multiple imputation. Biom Biostat Int J. 2015;2(3):93-94. DOI: 10.15406/bbij.2015.02.00033

Download PDF

Abstract

Multiple imputation is still an underused approach for handling missing data despite new advances and its potential in clinical, environmental, and health policy research. This review will discuss several benefits of this technique as well as the approaches that make the technique applicable to different types of data. Providing such examples may show investigators how considering this method may help in their future research.

Introduction

Multiple imputation expands the possibilities of different analyses involving complex models which would otherwise not converge given unbalanced data caused by missingness. An example of this scenario involved linear mixed effects models with repeated measures (Lindstrom and Bates, 1989; Milliken and Johnson, 1992). In such models, parameter estimates are commonly obtained by the restricted maximum likelihood (REML) algorithm. The computation involved with this algorithm cannot estimate the numerous parameters included in the model such as the within-subject variation, however, where covariates exhibit different patterns and amounts of missingness. This situation can be remedied through imputation where parameter estimates can be obtained from each imputed, balanced data set and averaged for each parameter.^1–3

Creating new avenues of analyses without collecting further data would be beneficial in terms of cost. Investigators may determine how to pursue their objectives given the significance of associations in their imputed data. Such an approach would aid in studies where each data point could be difficult and expensive to obtain.⁴ Analyses on the imputed data may be used to aid in choosing which variables provide the most insight into the questions proposed by the study at hand.

Multiple imputation has also been developed with consideration of the missing data mechanism, a facet potentially ignored with other approaches. This mechanisms including missing-completely-at-random (MCAR), missing-at-random (MAR), and non-ignorable missingness. Under the MCAR mechanism, missingness is independent of the observed and missing data and under the MAR mechanism, missingness is only dependent on observed data.^1,5 Multiple imputation techniques have also been developed for the non-ignorable mechanism where missingness depends also depends on the missing data.^6–8 Demirtas⁶ describes one approach of proceeding with imputing non-ignorable missing data using a pattern mixture model and incorporating indicator variables for the dropout groups.

Methods have also been developed for handling missing data which are non-normally distributed. Helenowski & Demirtas,⁹ Helenowski et al.¹⁰ and Helenowski & Demirtas ¹¹ discuss imputing non-normally distributed continuous data, binary data, and mixed non-normally distributed continuous data and binary data, respectively, allowing for the relaxation of assumptions associated with joint modeling. Here joint modeling involves the normal distribution for imputing continuous data, the multinomial data for imputing categorical data, and the general location model for imputing the mixed data with continuous and categorical variables. Helenowski, Demirtas, and McGee¹² extends the concepts of Helenowski et al.¹⁰ and Helenowski & Demirtas¹¹ to imputed mixed data with variables not normally distributed and categorical variables having more than two levels. These approaches involve transforming the data into normally distributed values, applying multiple imputation via joint modeling under assumptions of the normal model and back-transform the imputed values onto the scale of the original data.

Given these examples and review of new multiple imputation approaches presented, this article aims to persuade investigators to consider this technique in their work as its benefits could lead to enhancements in their objectives.