Solved – How to combine confidence intervals for a variance component of a mixed-effects model when using multiple imputation

confidence intervaldata-imputationmixed modelmodeling

The logic of multiple imputation (MI) is to impute the missing values not once but several (typically M=5) times, resulting in M completed datasets. The M completed datasets are then analyzed with complete-data methods upon which the M estimates and their standard errors are combined using Rubin's formulas to obtain the "overall" estimate and its standard error.

Great so far, but i'm not sure how to apply this recipe when variance components of a mixed-effects model are concerned. The sampling distribution of a variance component is asymmetrical – therefore the corresponding confidence interval can't be given in the typical "estimate ± 1.96*se(estimate)" form. For this reason the R packages lme4 and nlme don't even provide the standard errors of the variance components, but only provide confidence intervals.

We can therefore perform MI on a dataset and then get M confidence intervals per variance component after fitting the same mixed-effect model on the M completed datasets. The question is how to combine these M intervals into one "overall" confidence interval.

I guess this should be possible – the authors of an article (yucel & demirtas (2010) Impact of non-normal random effects on inference by MI) seem to have done it, but they don't explain exactly how.

Any tips would be much obliged!

Cheers, Rok

Best Answer

This is a great question! Not sure this is a full answer, however, I drop these few lines in case it helps.

It seems that Yucel and Demirtas (2010) refer to an older paper published in the JCGS, Computational strategies for multivariate linear mixed-effects models with missing values, which uses an hybrid EM/Fisher scoring approach for producing likelihood-based estimates of the VCs. It has been implemented in the R package mlmmm. I don't know, however, if it produces CIs.

Otherwise, I would definitely check the WinBUGS program, which is largely used for multilevel models, including those with missing data. I seem to remember it will only works if your MV are in the response variable, not in the covariates because we generally have to specify the full conditional distributions (if MV are present in the independent variables, it means that we must give a prior to the missing Xs, and that will be considered as a parameter to be estimated by WinBUGS...). It seems to apply to R as well, if I refer to the following thread on r-sig-mixed, missing data in lme, lmer, PROC MIXED. Also, it may be worth looking at the MLwiN software.

Related Question