Solved – What are the (best) methods for multiple comparisons correction with bootstrap for multiple glm models

bootstrapmultiple regressionmultiple-comparisonsreferencesregression coefficients

See the related, but old question: Correcting p values for multiple tests where tests are correlated (genetics).

Multiple comparison methods based on bootstrap have the advantage of taking account of dependence structure of pvalues. Regression models are just a little more difficult to bootstrap, though. There have been several methods proposed.
I've come across at least three ways of adjusting p-values in such problems.

As far as I see it, there is no single "best" solution. The question is: what methods are available and what are the advantages/disadvantages of each?

I propose the following definitions:

$\vec{\theta_0}$ is vector of the hypothesized values of the parameters, which is assumed under the complete null hypothesis.

$\hat{\vec{\theta}}$ is an estimator of the parameters, calculated on the original sample.

$\hat{\vec{\theta}^*}$ is an estimator of the parameters, calculated on one of the bootstrap samples.

$T$, $T^*$ are pivot statistics calculated on original data and bootstrap sample. All pivot statistic are assumed to be Wald, i.e. $T = \frac{\theta}{\operatorname{SE}\theta}$.

$m$ is number of terms of interest in all regressions.

Best Answer

Method 1: Naive bootstrap

  1. Calculate the $\hat{\vec{\theta}}$ on each bootstrap sample. This way we will know the (hopefully) natural variability in the test statistics.
  2. The adjusted p-value is the 1 - percentage of cases where $\hat{\theta} > \theta_0$ (or $\hat{\theta} < \theta_0$ or $\left|\frac{\hat{\theta}}{\theta_0}\right| > 1$; the form depends on the nature of $\theta$ parameter. It should yield true for bootstrap result being "more significant" than the reference).

This method violates both guidelines stated in the article by Hall P. and Wilson S.R. "Two Guidelines for Bootstrap Hypothesis Testing" (1992), so it lacks power (in our case should be too conservative).

Method 2: Free step-down resampling (MaxT) using Wald statistics

The name and decription comes from the book by "Applied Statistical Genetics with R (2009)" by Foulkes Andrea.

  1. Preparing for the bootstrap If it is possible, compute the residuals from the regressions, replacing the original dependent variables. Keep independent variable unchanged.
  2. Generate bootstrap samples from this new dataset.
  3. Gather the pivot statistics. On each sample compute the Wald statistics $T^*$. $\hat{T^*}=\frac{\hat{\vec{\theta}^*}}{\operatorname{SE}(\hat{\vec{\theta}^*})}$. Since all were computed under complete null (because dependent variable is in fact regression residuum), we can treat them as a set of potentially correlated, zero-centered random variables.
  4. The adjusted p-value for the $j$-th regression coefficient is the percentage of cases, where the observed $\hat{T}$ is equally or less significant, than the $j/m$ quantile of set of $\hat{\vec{T^*}}$ values.

The problem is that this method is not working, when one cannot simply calculate the regression residual, when e.g. there are many independent regression models, and some of them share the same dependent variable.

Method 3: Null unrestricted bootstrap

This method is very similar to Free step-down resampling, with the difference that instead of calculating the residuals, one adjusts the $T$ statistic:

  1. Generate bootstrap samples from the dataset.
  2. Gather the pivot statistics. On each sample compute the Wald statistics $T^*$. $\hat{T^*}=\frac{\hat{\vec{\theta}^*}-\hat{\vec{\theta}}}{\operatorname{SE}(\hat{\vec{\theta}^*})}$. Since expected value of $\vec{\theta}^*$ equals $\vec{\theta}$, we can treat them as a set of potentially correlated, zero-centered random variables.
  3. The adjusted p-value for the $j$-th regression coefficient is the percentage of cases, where the observed $\hat{T}$ is equally or less significant, than the $j/m$ quantile of set of $\hat{\vec{T^*}}$ values.
Related Question