Solved – Why does bootstrapping the residuals from a mixed effects model yield anti-conservative confidence intervals

bootstrapconfidence intervalmixed modelmonte carlosimulation

I typically deal with data where multiple individuals are each measured multiple times in each of 2 or more conditions. I have recently been playing with mixed effects modelling to evaluate evidence for differences between conditions, modelling individual as a random effect. To visualize uncertainty regarding the predictions from such modelling, I have been using bootstrapping, where on each iteration of the bootstrap both individuals and observations-within-conditions-within-individuals are sampled with replacement and a new mixed effect model is computed from which predictions are obtained. This works fine for data that assumes gaussian error, but when the data are binomial, the bootstrapping can take a very long time because each iteration must compute a relatively compute-intensive binomial mixed effects model.

A thought I had was that I could possibly use the residuals from the original model then use these residuals instead of the raw data in the bootstrapping, which would permit me to compute a gaussian mixed effect model on each iteration of the bootstrap. Adding the original predictions from the binomial model of the raw data to the bootstrapped predictions from residuals yields a 95% CI for the original predictions.

However, I recently coded a simple evaluation of this approach, modelling no difference between two conditions and computing the proportion of times a 95% confidence interval failed to include zero, and I found that the above residuals-based bootstrapping procedure yields rather strongly anti-conservative intervals (they exclude zero more than 5% of the time). Furthermore, I then coded (same link as previous) a similar evaluation of this approach as applied to data that is originally gaussian, and it obtained similarly (though not as extreme) anti-conservative CIs. Any idea why this might be?

Best Answer

Remember all bootstrap confidence intervals are only asymptotically at the stated confidence level. There are also a slew of possible methods for selecting bootstrap confidence intervals Efron's percentile method, Hall's percentile method, double bootstrap, bootstrap t, tilted bootstrap, BC, BCa and maybe a few more. You haven't told us which method you use. Schenker's paper in JASA 1985 showed that for certain chi square distributions the BC bootstrap confidence interval undercovered the advertised percentage. In small sample size problems this problem can be severe. LaBudde and I have two papers showing how in small samples even BCa can have very poor coverage when estimating a variance from a lognormal distribution and a similar problem exists for testing equality of two variances. This is just for a simple problem. I expect the same thing can happen with residuals from mixed models. In our new book "An Introduction to Bootstrap Methods with Applications to R" published by Wiley in 2011 we cover this topic in Section 3.7 and provide references. The surprise is that the percentile method sometimes does better than the higher order accurate BCa method, when the sample size is small.