Bootstrap Statistics – How to Calculate Bootstrap-Based Confidence Intervals

bootstrapconfidence interval

While studying bootstrap-based confidence interval, I once read the following statement:

If the bootstrap distribution is skewed to the right, the bootstrap-based confidence interval incorporates a correction to move the endpoints even farther to the right; this may seem counterintuitive, but it is the correct action.

I am trying to understand the logic underlying the above statement.

Best Answer

The question is related to the fundamental construction of confidence intervals, and when it comes to bootstrapping, the answer depends upon which bootstrapping method that is used.

Consider the following setup: $\hat{\theta}$ is an estimator of a real valued parameter $\theta$ with (an estimated) standard deviation $\text{se}$, then a standard 95% confidence interval based on a normal $N(\theta, \text{se}^2)$ approximation is $$\hat{\theta} \pm 1.96 \text{se}.$$ This confidence interval is derived as the set of $\theta$'s that fulfill $$z_{1} \leq \hat{\theta} - \theta \leq z_2$$ where $z_1 = -1.96\text{se}$ is the 2.5% quantile and $z_2 = 1.96\text{se}$ is the 97.5% quantile for the $N(0, \text{se}^2)$-distribution. The interesting observation is that when rearranging the inequalities we get the confidence interval expressed as $$\{\theta \mid \hat{\theta} - z_2 \leq \theta \leq \hat{\theta} - z_1 \} = [\hat{\theta} - z_2, \hat{\theta} - z_1].$$ That is, it is the lower 2.5% quantile that determines the right end point and the upper 97.5% quantile that determines the left end point.

If the sampling distribution of $\hat{\theta}$ is right skewed compared to the normal approximation, what is then the appropriate action? If right-skewed means that the 97.5% quantile for the sampling distribution is $z_2 > 1.96\text{se}$, the appropriate action is to move the left end point further to the left. That is, if we stick to the standard construction above. A standard usage of the bootstrap is to estimate the sampling quantiles and then use them instead of $\pm 1.96 \text{se}$ in the construction above.

However, another standard construction used in bootstrapping is the percentile interval, which is $$[\hat{\theta} + z_1, \hat{\theta} + z_2].$$ in the terminology above. It is simply the interval from the 2.5% quantile to the 97.5% quantile for the sampling distribution of $\hat{\theta}.$ A right-skewed sampling distribution of $\hat{\theta}$ implies a right-skewed confidence interval. For the reasons mentioned above, this appears to me to be a counter-intuitive behavior of percentile intervals. But they have other virtues, and are, for instance, invariant under monotone parameter transformations.

The BCa (bias-corrected and accelerated) bootstrap intervals as introduced by Efron, see e.g. the paper Bootstrap Confidence Intervals, improve upon the properties of percentile intervals. I can only guess (and google) the quote the OP post, but maybe BCa is the appropriate context. Citing Diciccio and Efron from the paper mentioned, page 193,

The following argument motivates the BCa definition (2.3), as well as the parameters $a$ and $z_0$. Suppose that there exists a monotone increasing transformation $\phi = m(\theta)$ such that $\hat{\phi} = m(\hat{\theta})$ is normally distributed for every choice of $\theta$, but possibly with a bias and a nonconstant variance, $$\hat{\phi} \sim N(\phi - z_0 \sigma_{\phi}, \sigma_{\phi}^2), \quad \sigma_{\phi} = 1 + a \phi.$$ Then (2.3) gives exactly accurate and correct confidence limits for $\theta$ having observed $\hat{\theta}$.

where (2.3) is the definition of the BCa intervals. The quote posted by the OP may refer to the fact that BCa can shift confidence intervals with a right-skewed sampling distribution further to the right. It is difficult to tell if this is the "correct action" in a general sense, but according to Diciccio and Efron it is correct in the setup above in the sense of producing confidence intervals with the correct coverage. The existence of the monotone transformation $m$ is a little tricky, though.

Related Question