Bootstrap – Proper Bootstrapping Techniques for Clustered Data

bootstrapmixed modelrandom-effects-model

I have a question regarding the proper bootstrapping technique to use with data where strong clustering is present.

I've been tasked with evaluating a multivariate mixed effects predictive model on insurance claims data by scoring the current baseline model on more recent claims data, in order to determine how well the model predicts which episodes of care contain the highest frequency of sessions (upper 95th percentile). Sensitivity, specificity, and positive predictive value (PPV) will be used to assess the model effectiveness.

Bootstrapping seems the right way to go for building confidence intervals for the sensitivity, specificity, and PPV percentages. Unfortunately, a naive bootstrap is not appropriate given that the claims data is 1) correlated by care provider, 2) grouped into episodes of care with more frequent visits during months earlier in the episode of care (so some autocorrelation is present). Would a variation on the moving blocks bootstrap technique be appropriate here?

Or perhaps a three-step bootstrap procedure would work: 1) sample with replacement from the distinct providers in the data, then 2) sample with replacement from distinct episodes of care by selected providers, then 3) sample with replacement from distinct claims within each selected episode.

Many thanks for any suggestions!

Best Answer

The second approach you suggest seems reasonable, but it turns out that it is better to only sample with replacement at the highest level, and without replacement at the remaining sublevels when bootstrapping hierarchical data. This is shown from simulations by Ren et al (2010) : http://www.tandfonline.com/doi/abs/10.1080/02664760903046102

Field & Welsh (2007) theoretically investigated different approaches for 2-level data sets and found that sampling with replacement at both levels was not a brilliant idea.
http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2007.00593.x/full

The autocorrelation that you mention is a serious problem. On the other hand, selecting without replacement from episodes of care would preserve the autocorrelation structure so maybe it is not such a big problem.

Related Question