Solved – Missing data – Regression imputation

data-imputationmicemissing datarregression

I want to produce imputations for the missing values using a naive imputation method "Regression imputation " . The first step involves building a model from the observed data then predictions for the incomplete cases are calculated
under the fitted model, and serve as replacements for the missing data .

Suppose that we model Ozone by the linear regression function of Solar.R

> library(mice)
> fit <- lm(Ozone ~ Solar.R, data = airquality)
> pred <- predict(fit, newdata = ic(airquality))
# Or alternatively using mice package 
> imp <- mice(airquality[,1:2], method="norm.predict", m=1, maxit=3,seed=1)
> head(airquality[5,1:2])
> head(complete(imp)[5,])

I did not get how the fifth observation is imputed under the fitted model ? , since both Ozone and Solar.R are missing !.

Best Answer

Your linear regression can't predict on the missing data if it doesn't have a predictor. So your value is not imputed.

Although it does involve regressions, Multivariate Imputation by Chained Equations (MICE) is a bit different from your linear regression approach. In a nutshell, missing variables are first tentatively filled, which makes them suitable as predictors, and then they are iteratively imputed. I would suggest looking at the pseudocode in Azur, M. J.; Stuart, E. A.; Frangakis, C. & Leaf, P. J. (2011) Multiple Imputation by Chained Equations: What is it and how does it work?. International journal of methods in psychiatric research, 20, 40-49 to understand what the algorithm does.