P-Value Pooling – How to Pool Bootstrapped P-values Across Multiply Imputed Data Sets

bootstrapconfidence intervalmultiple-imputationp-valuevariance

I am concerned with the problem that I would like to bootstrap the p-value for an estimate of $\theta$ from multiply imputed (MI) data, but that it is unclear to me how to combine the p-values across MI sets.

For MI data sets, the standard approach to get to the total variance of estimates uses Rubin's rules. See here for a review of pooling MI data sets. The square root of the total variance serves as a standard error estimate of $\theta$. However, for some estimators the total variance has no known closed form or the sampling distribution is not normal. The statistic ${\theta}/{se(\theta)}$ may then not be t-distributed, not even asymptotically.

Therefore, in the complete data case, one alternative option is to bootstrap the statistic to find variance, a p-value and a confidence interval, even if the samling distribution is not normal and its closed form unknown. In the MI case there are then two options:

  • Pool the bootstrapped variance across MI data sets
  • Pool the p-value or confidence bounds across MI data sets

The first option would then again use Rubin's rules. However, I believe that this is problematic, if $\theta$ has a non-normal sampling distribution. In this situation (or more generally, in all situations) the bootstrapped p-value can be used directly. However, in the MI case, this would lead to multiple p-values or confidence intervals, which need to be pooled across MI data sets.

So my question is: how should I pool multiple bootstrapped p-values (or confidence intervals) across multiply imputed data sets?

I would welcome any suggestions on how to proceed, thank you.

Best Answer

I think both options result in the correct answer. In general, I would prefer method 1 as that preserves the entire distribution.

For method 1, bootstrap the parameter $k$ times within each of the $m$ MI solutions. Then simply mix the $m$ bootstrapped distributions to obtain your final density, now consisting of $k \times m$ samples that include the between-imputation variation. Then treat that as a conventional bootstrap sample to get confidence intervals. Use the Bayesian bootstrap for small samples. I know of no simulation work that investigates this procedure, and this is actually an open problem to be investigated.

For method 2, use the Licht-Rubin procedure. See How to get pooled p-values on tests done in multiple imputed datasets?

Related Question