Solved – Multiple Imputation Pooling mean and standard deviation

micemultiple-imputationpoolingr

Im doing a multiple imputation of a dataset using R's MICE package.

imp <- mice(nhanes, m=5, print = FALSE, seed = 55152)

I figured out that to pool regression coefficients you really only need to get the mean of the 5 regression coefficients for the 5 datasets.

But now i need to pool means, confidence intervals and standard deviation using Rubin's rules.

How do i do that?

/Kind regards

Best Answer

You can use the pool function that comes along with mice. In ist helpsite you will find the following example:

imp <- mice(nhanes, m=5, print = FALSE, seed = 55152)
fit <- with(data=imp,exp=lm(bmi~hyp+chl))
summary(pool(fit))

what gives

                    est         se          t       df     Pr(>|t|)       lo 95       hi 95 nmis       fmi     lambda
(Intercept) 19.63676903 4.33084987  4.5341606 15.73596 0.0003524824 10.44324131 28.83029675   NA 0.2394702 0.14858446
hyp         -0.43069297 2.07375135 -0.2076879 18.41666 0.8377520514 -4.78042837  3.91904242    8 0.1563331 0.06943169
chl          0.03803107 0.02241891  1.6963831 14.99264 0.1104714393 -0.00975576  0.08581789   10 0.2619634 0.16966640

Unfortunately, at the moment, I'm too blind to see/find the smart solution. So let's do the pooling according to Rubin's rules by hand. If we assume $x_i, ~ \ldots, x_n$ to be iid samples from $N(\mu, \sigma^2)$, then $\bar{x} \sim N(\mu, \sigma^2/n)$. We will need $\sigma^2/n$ for the within variance:

n <- nrow(nhanes)
m <- 5
Q <- array(dim = m)
U <- array(dim = m)

for (i in 1:m){
  Q[i] <- mean(complete(imp, i)$bmi)
  U[i] <- var(complete(imp, i)$bmi)/n
}
B <-  var(Q)
mean_of_means <- mean(Q)
total_variance_of_means <- mean(U) + (1 + 1/m) * B