Solved – Multiple Imputation how to get one dataset out m=50

multiple-imputationr

So I am new to R and new to MI as well. Reading through "Flexible Imputation of Missing Data" and slowly becoming acquainted.

I was going through a sample run of my data, worked through most of the necessary steps outlined in book and "mice: Multivariate Imputation by Chained
Equations in R" paper…but I can't come up with a code that lets you take all the 50 datasets and just get one output data with imputed/oroginal values. Is this even possible with MICE or MI in general? I tried Amelia and Zelig too and I get confused at this step. I get the point that one should pool through all the datasets but going through all 50 in this sample run would be too exhausting.

Since I am still new, I apologize for missing something and would appreciate an example or code that lets me pool one dataset to model with through an index in another software.

Best Answer

It seems that you want to stack the imputed datasets. As noted by those who have commented previously, this is not the best way to analyse the data (point estimates tend to be accurate, but the variability accounted for by the imputation process is no longer present and error will be reduced). Nevertheless, stacking the data is achieved by using the complete function in the mice package. Once stacked, the data can be exported easily to other software programs.

# Impute the data using the default options
imp <- mice(df)

# Check convergence
plot(imp)

# Stack imputed data into one LONG dataset (generates two new variables indicating id and imputation number); raw (unimputed) data are appended (inc = TRUE)
com <- complete(imp, "long", inc = TRUE)

# Obtain first imputed dataset
com <- complete(imp)
com <- complete(imp, 1)

# Obtain second imputed dataset
com <- complete(imp, 2)

It is also possible to export the mids object (imp) directly to SPSS (if that is your other software) using the mids2spss function in mice.