Confidence Intervals – Comparison of Confidence Intervals: Bootstrap vs Exact Resampling Techniques

bootstrapconfidence intervalmonte carloresamplingself-study

Consider data $X_1,…X_n$ generated from a probability distribution $F$ with density $f$.

I'm interested in constructing confidence intervals for a parameter say, $\theta(F)$. Via Monte Carlo simulations, I want to compare the expected lengths and coverage probabilities of $X\%$ confidence intervals based on

  • exact resampling (i.e., can generate multiple independent samples of size $n$ from $F$) and

  • bootstrap.


This explains the fundamental difference between Bootstrapping and Monte Carlo procedure. While I understand what it means to use Monte Carlo simulations to generate confidence intervals for bootstrap resampling method, I'm not sure how to proceed with this.

Theoretically, what I know is if $X_1^{*},X_2^{*},\ldots,X_n^{*}$ is a single resample then, the bootstrap estimate for $G_n(t)=P_F(\sqrt{n}[\hat{\theta}-\theta]\le t)$ is given by $\hat G_n(t)=P_{F_n}(\sqrt{n}[\hat\theta^*-\hat\theta]\le t)$ where $F_n$ is the resampled distribution. Using Monte Carlo, if we have generated $B$ resamples, the bootstrap estimate is $\tilde G_n(t)=\frac{1}{B}\sum^B \hat G_{n,i}(t)$.

Also, I know that the pivotal quantity $\sqrt{n}[\hat{\theta}-\theta]$ can be a little different for some parameters, just using it as a placeholder.

Confidence interval from a one resample $X_1^*,\ldots,X_n^*$ would be for $\hat\theta$: $$\left(\hat\theta^*\pm\frac{q_{\alpha/2}}{\sqrt{n}}\right)$$
where, $q_{\alpha/2}$ is the $\alpha/2$th quantile from $F_n$. But where does the Monte Carlo estimate prove useful?

From a single bootstrapped sample I get a single estimate of the mean. To use Monte Carlo, I get $B$ estimates of the mean using which, I need to create a confidence interval but, how? Moreover, a single confidence interval is not enough to get the expected length and coverage probability. Do I need to get $N$ different confidence intervals? I don't want the complete procedure written, I just want to have the gaps in my understanding cleared.

Best Answer

The link in your question describes an exact bootstrap distribution where every possible way of resampling is being computed. For large samples this can be unfeasible due to the amount of computations required and an alternative is to use random resampling from the sample.

So your bootstrap distribution (based on which confidence interval can be approximated) is created from one single observed sample. And that one single sample is resampled multiple times to compute the bootstrap distribution of a statistic that describes the distribution.

Your homework asks to make another level of repetition, by repeating the above described bootstrap procedure (created for a single sample) with multiple simulations of observed samples, and record the properties of the confidence interval.

So you create $N$ simulations of an observed sample. For each observed sample you resample $M$ times to create estimates of a bootstrap distribution and estimates of a confidence interval. You will end up with $N$ confidence intervals, one for each simulated observation.