Solved – Do I need to adjust the degrees of freedom returned by pool.compare() in MICE

degrees of freedomlinear modelmicemultiple-imputationr

I am analyzing a multiply imputed dataset produced from the MICE package in R. To assess the overall significance of my linear model, I am using pool.compare() to compare my "full" model to an intercept only "restricted" model. However, the degrees of freedom (residual) returned by pool.compare() seem very highly inflated (I have set m = 50 imputations). I'm aware that 50 imputations is high, but it's needed for my dataset. I've given an example below of the same issue using the nhanes2 dataset from the MICE package. I have two questions:

1) Why are the degrees of freedom returned by pool.compare() so high?

2) Is it appropriate to use the adjustment to the degrees of freedom suggested by Barnard and Rubin (1999) and described in section 2.3.6 of Stef van Burren's Flexible Imputation of Missing Data textbook?

The R code below shows the issue I'm asking about using the nhanes2 dataset. This dataset has 25 observations and the example fits a linear model with one categorical predictor (age) with three levels and one continuous predictor (chl).

# load package and data  
library("mice")  
data(nhanes2)  

# impute missing values, m = 50
imp <- mice(nhanes2, m = 50, seed = 1, print = FALSE)

# produce the models to compare, a full model and
# an intercept only restricted model  
fit.imputed.full <- with(imp, lm(bmi ~ age + chl))
fit.imputed.res <- with(imp, lm(bmi ~ 1))  

# compare models using pool.compare()
pooled.comparison <- pool.compare(fit.imputed.full, fit.imputed.res)

# given that the original dataset had 25 observations, and we have a 
# linear model with three predictors (age is a factor with three levels)
# I'd expect the degrees of freedom (residual) for the comparison to be at  
# most 24. The df for the numerator comes as expected:

pooled.comparison$df1
[1] 3

# the df for the denominator comes out a much larger than the 
# maximum of 24:

pooled.comparison$df2
[1] 1374.457

# by way of comparison, the same analysis conducted on a single
# hypothetically complete dataset gives the expected degrees of freedom

nhanes2CCA <- complete(imp, 1)
attach(nhanes2CCA)
fit.CCA.full <- lm(bmi ~ age + chl)
fit.CCA.res <- lm(bmi ~ 1)
detach(nhanes2CCA)
anova(fit.CCA.full, fit.CCA.res)

Model 1: bmi ~ age + chl
Model 2: bmi ~ 1
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1     21 293.60                              
2     24 477.23 -3   -183.62 4.3778 0.01525 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In the end, it seems strange that an analysis conducted on a hypothetically complete dataset returns 24 degrees of freedom, while an analysis conducted on 50 multiply imputed datasets returns over 1000 degrees of freedom. Why is there this large difference in degrees of freedom?

My second question relates to the correction proposed by Barnard and Rubin(1999). Is it appropriate to use that correction here? Because this is a multi-parameter test, doing so requires, I guess, an estimate of lambda which is averaged across the parameters being estimated.

The figures I've used in this example are:
v_old = 1374.457
v_com = 25-1 = 24
average lambda = 0.329
v_obs = 14.91
v (adjusted degrees of freedom) = 14.75

Applying this correction in this instance returns a corrected degrees of freedom of 14.75, which is more than the df that would be returned by analyzing only complete cases (12) and less than the df that would be returned by analyzing a hypothetically complete dataset (24). Which seems reasonable.

Thank you all for your assistance.

Matt.

Best Answer

You posed two questions, so I will simply comment on them in order:

Question 1: High degrees of freedom:

Such high degrees of freedoms are normal with pool.compare. The function implements the procedure by Meng & Rubin (1992), in which the denominator degrees of freedom for the test statistic $D_m$ are derived under the assumptions that the complete-data degrees of freedom are infinite (see also Rubin, 1987).

Thus, the procedure will estimate the degrees of freedom smaller than in the hypothetical complete-data (i.e., smaller than infinity), which often results in relatively large denominator degrees of freedom in MI. Sometimes this is inappropriate, especially in smaller samples.

Question 2: Correction formula of Barnard & Rubin:

The correction formula in Barnard & Rubin (1999) adresses the aforementioned problem, but not for multiparameter tests (as done in pool.compare) but for tests for scalar estimands (e.g., a single regression coefficient).

Therefore, this correction formula is not the way to go here. Luckily, there is also a correction formula available for multiparameter tests. That formula was proposed by Reiter (2007) and was originally developed for the procedure by Li, Raghunathan, and Rubin (1991).

However, these two procedures are asymptotically identical in many cases, and the expression for the degrees of freedom is the same in $D_1$ and $D_3$. Therefore, I would suggest you apply Reiter's correction formula to the results in pool.compare. The formula is not much more difficult to apply than that of Barnard & Rubin, and it is also implemented in a couple of R packages.

You can find some very readable applications of Reiter's correction formula in the article by van Ginkel and Kronenberg (2014), who apply the procedure of Li et al (1991) with Reiter's corrections to the ANOVA (recall that Meng & Rubin, 1992, and Li et al., 1991, can be thought interchangeable in this case).

Edit:

However, it is not unlikely that you will observe no big difference. The outcome of your hypothesis test will likely remain the same.

Related Question