Model Diagnostics in Multiple Imputation

diagnosticmicemultiple-imputationresiduals

When using multiple imputation, what is the best way to run model diagnostics? In a related post here (Multiple Imputation and Regression Model Diagnostics), one option in the accepted answer was looking at diagnostics for the individual models that are fit to the m imputed data sets.

But in section 6.6 of Flexible Imputations of Missing Data (Stef van Buuren, https://stefvanbuuren.name/fimd/sec-diagnostics.html):

"Conventional model evaluation concentrates on the fit between the data and the model. In imputation it is often more informative to focus on distributional discrepancy, the difference between the observed and imputed data"

and

"Figure 6.10 is the worm plot calculated from imputed data after predictive mean matching. The fit between the observed data and the imputation model is bad. The blue points are far from the horizontal axis, especially for the youngest children. The shapes indicate that the model variance is much larger than the data variance. In contrast to this, the red and blue worms are generally close, indicating that the distributions of the imputed and observed body weights are similar. Thus, despite the fact that the model does not fit the data, the distributions of the observed and imputed data are similar. This distributional similarity is more relevant for the final inferences than model fit per se."

So should the focus then be on the distribution of the imputed and observed values instead of looking at the individual model diagnostics?

Best Answer

The reason there are two different recommendations is that there are two different questions that need to be answered.

  1. Does the imputation model accurately replace missing data with appropriate values?

  2. Is the analysis model a good fit to the data? Are the assumptions of the analysis model satisfied?

For the final results to be valid, both answers should be yes.

The first question is addressed by van Buuren's recommendations and other imputation resources. We want to be sure that the imputation models are not generating impossible or unreasonable values. We want to see whether the imputed values are similar to the original values. If there are differences between the original and imputed values, then we should do additional work to check: a) if the differences are a result of a poor imputation model, or a result of the data being MAR or MNAR. b) if the data differences make a significant difference in the results of the analysis model.

The second question is addressed by the linked answer. Because we're using multiple imputation many traditional diagnostics don't work. We can't look at "the" residuals from the analysis model, for example. Instead we can look at the residuals from the $i$th model and check how the $i$th model fits the $i$th imputed dataset. By checking our analysis model assumptions on some (or all) of the imputed datasets we will get an idea of whether the analysis model assumptions are reliable.

So, you should do both methods of model diagnosis.

Related Question