Iterations in Multiple Imputation

data-imputationmicemultiple-imputationr

Despite reading this other StatsExchange post, I am still struggling to understand what iterations do in multiple imputation, i.e. the parameter "maxit" in the mice() function.

My understanding of iterations was that in the case of multiple predictor variables, since predictor variables would be used to impute other predictor variables, we would need multiple iterations to account for the different possible ways this could occur. Hence, we would get different predictor variable imputations for different iterations.

But this doesn't seem to be the correct interpretation, as even in a simulation with one predictor/response variable (code below), each iteration gives me different values…
Below, I am applying multiple imputation to predict bmi (has missingness) using ONLY age (NO missingness). I expected the mean across imputations to be constant (i.e. each plotted line is horizontal) for each iteration, as there is only one predictor variable and hence only one source of uncertainty. What am I missing here? Thanks.

require(mice)
require(lattice)

imp <- mice(nhanes, m = 3, print=F, seed = 123)
pred <- imp$pred

pred[ ,"hyp"] <- 0
pred[,"chl"] <- 0
pred
imp <- mice(nhanes, pred=pred, print=F)
plot(imp)

plot(imp) output

Best Answer

The idea of multiple imputation is to create multiple imputed datasets, for which the missing values are replaced by imputed values that differ across the multiple imputed datasets. The variation in the imputed values reflects the uncertainty about the missing value under the (implicit) model that is being use to create the imputations. In this respect, you want to get different values (I'm not quite sure whether you got imputations mixed up with iterations).

Iterations refers to the number of iterations the algorithm, which is used to fit the imputation model/draw the imputed pseudo-random numbers to replace the missing values, is allowed to use at most (maxit = maximum number of iterations). Assuming you set a high enough value for maxit, you should get a valid imputation under your assumed imputation model. If you then hold the random number seed fixed before each call to the imputation procedure, you should get the same imputations back when calling it repeatedly.

Related Question