I am conducting multiple imputation by chained equations in R using the MICE package, followed by a logistic regression on the imputed dataset.
I need to compute a 95% confidence interval about the predictions for use in creating a plot—that is, the grey shading in the image below.
I followed the approach described in the answer to this question…
How are the standard errors computed for the fitted values from a logistic regression?
…which uses the following lines of code to yield the std.er of prediction for any specific value of the predictor:
o <- glm(y ~ x, data = dat)
C <- c(1, 1.5)
std.er <- sqrt(t(C) %*% vcov(o) %*% C)
But of course I need to adapt this code to the fact that I am using a model resulting from multiple imputation. In that context, I am not sure which variance-covariance matrix (corresponding to “vcov(o)” in the above example) I should be using in my equation to produce the "std.er".
Based on the documentation for MICE I see three candidate matrices:
-
ubar – The average of the variance-covariance matrix of the complete data estimates.
-
b – The between imputation variance-covariance matrix.
-
t – The total variance-covariance matrix.
http://www.inside-r.org/packages/cran/mice/docs/is.mipo
Based on trying all three, the b matrix seems patently wrong, but both the t and the ubar matrices seem plausible. Can anybody confirm which one is appropriate?
Thank you.
Best Answer
The
t
matrix is the one to use in the way you describe. Eqs. 4 through 7 in the Dong & Peng paper that Joe_74 references correspond to the elements of the same names in themipo
object (documentation here), and sot
is the accurate variance-covariance matrix for the pooled regression coefficientsqbar
you're actually using.ubar
andb
only matter here in that they are/were used to computet
.Presumably you'll be using more than one predictor, so here's a MWE for that, which should be easy to modify.