Solved – Why does the bootstrap interval have terrible coverage

bootstrapdiagnostic

I wanted to do a class demonstration where I compare a t-interval to a bootstrap interval and calculate the coverage probability of both. I wanted the data to come from a skewed distribution so I chose to generate the data as exp(rnorm(10, 0, 2)) + 1, a sample of size 10 from a shifted lognormal. I wrote a script to draw 1000 samples and, for each sample, calculate both a 95% t-interval and a 95% bootstrap percentile interval based on 1000 replicates.

When I run the script, both methods give very similar intervals and both have coverage probability of 50-60%. I was surprised because I thought the bootstrap interval would be better.

My question is, have I

made a mistake in the code?
made a mistake in calculating the intervals?
made a mistake by expecting the bootstrap interval to have better coverage properties?

Also, is there a way to construct a more reliable CI in this situation?

 tCI.total <- 0
 bootCI.total <- 0
 m <- 10 # sample size
 true.mean <- exp(2) + 1

for (i in 1:1000){
 samp <- exp(rnorm(m,0,2)) + 1
 tCI <- mean(samp) + c(1,-1)*qt(0.025,df=9)*sd(samp)/sqrt(10)

 boot.means <- rep(0,1000)
 for (j in 1:1000) boot.means[j] <- mean(sample(samp,m,replace=T))
 bootCI <- sort(boot.means)[c(0.025*length(boot.means), 0.975*length(boot.means))]

 if (true.mean > min(tCI) & true.mean < max(tCI)) tCI.total <- tCI.total + 1
 if (true.mean > min(bootCI) & true.mean < max(bootCI)) bootCI.total <- bootCI.total + 1 
}
tCI.total/1000     # estimate of t interval coverage probability
bootCI.total/1000  # estimate of bootstrap interval coverage probability

Best Answer

Bootstrap diagnostics and remedies by Canto, Davison, Hinkley & Ventura (2006) seems to be a logical point of departure. They discuss multiple ways the bootstrap can break down and - more importantly here - offer diagnostics and possible remedies:

Outliers
Incorrect resampling model
Nonpivotality
Inconsistency of the bootstrap method

I don't see a problem with 1, 2 and 4 in this situation. Let's look at 3. As @Ben Ogorek notes (although I agree with @Glen_b that the normality discussion may be a red herring), the validity of the bootstrap depends on the pivotality of the statistic we are interested in.

Section 4 in Canty et al. suggests resampling-within-resamples to get a measure of bias and variance for the parameter estimate within each bootstrap resample. Here is code to replicate the formulas from p. 15 of the article:

library(boot)
m <- 10 # sample size
n.boot <- 1000
inner.boot <- 1000

set.seed(1)
samp.mean <- bias <- vars <- rep(NA,n.boot)
for ( ii in 1:n.boot ) {
    samp <- exp(rnorm(m,0,2)) + 1
    samp.mean[ii] <- mean(samp)
    foo <- boot(samp,statistic=function(xx,index)mean(xx[index]),R=inner.boot)
    bias[ii] <- mean(foo$t[,1])-foo$t0
    vars[ii] <- var(foo$t[,1])
}

opar <- par(mfrow=c(1,2))
    plot(samp.mean,bias,xlab="Sample means",ylab="Bias",
        main="Bias against sample means",pch=19,log="x")
    abline(h=0)
    plot(samp.mean,vars,xlab="Sample means",ylab="Variance",
        main="Variance against sample means",pch=19,log="xy")
par(opar)

bootstrap diagnostics

Note the log scales - without logs, this is even more blatant. We see nicely how the variance of the bootstrap mean estimate goes up with the mean of the bootstrap sample. This to me looks like enough of a smoking gun to attach blame to nonpivotality as a culprit for the low confidence interval coverage.

However, I'll happily admit that one could follow up in lots of ways. For instance, we could look at how whether the confidence interval from a specific bootstrap replicate includes the true mean depends on the mean of the particular replicate.

As for remedies, Canty et al. discuss transformations, and logarithms come to mind here (e.g., bootstrap and build confidence intervals not for the mean, but for the mean of the logged data), but I couldn't really make it work.

Canty et al. continue to discuss how one can reduce both the number of inner bootstraps and the remaining noise by importance sampling and smoothing as well as add confidence bands to the pivot plots.

This might be a fun thesis project for a smart student. I'd appreciate any pointers to where I went wrong, as well as to any other literature. And I'll take the liberty of adding the diagnostic tag to this question.

Related Solutions

Solved – Coverage probabilities of the basic bootstrap confidence Interval

The terminology is probably not used consistently, so the following is only how I understand the original question. From my understanding, the normal CIs you computed are not what was asked for. Each set of bootstrap replicates gives you one confidence interval, not many. The way to compute different CI-types from the results of a set of bootstrap replicates is as follows:

B    <- 999                  # number of replicates
muH0 <- 100                  # for generating data: true mean
sdH0 <- 40                   # for generating data: true sd
N    <- 200                  # sample size
DV   <- rnorm(N, muH0, sdH0) # simulated data: original sample

Since I want to compare the calculations against the results from package boot, I first define a function that will be called for each replicate. Its arguments are the original sample, and an index vector specifying the cases for a single replicate. It returns $M^{\star}$, the plug-in estimate for $\mu$, as well as $S_{M}^{2\star}$, the plug-in estimate for the variance of the mean $\sigma_{M}^{2}$. The latter will be required only for the bootstrap $t$-CI.

> getM <- function(orgDV, idx) {
+     bsM   <- mean(orgDV[idx])                       # M*
+     bsS2M <- (((N-1) / N) * var(orgDV[idx])) / N    # S^2*(M)
+     c(bsM, bsS2M)
+ }

> library(boot)                                       # for boot(), boot.ci()
> bOut <- boot(DV, statistic=getM, R=B)
> boot.ci(bOut, conf=0.95, type=c("basic", "perc", "norm", "stud"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates
CALL : 
boot.ci(boot.out = bOut, conf = 0.95, type = c("basic", "perc", "norm", "stud"))

Intervals : 
Level      Normal            Basic         Studentized        Percentile    
95%   ( 95.6, 106.0 )   ( 95.7, 106.2 )  ( 95.4, 106.2 )   ( 95.4, 106.0 )  
Calculations and Intervals on Original Scale

Without using package boot you can simply use replicate() to get a set of bootstrap replicates.

boots <- t(replicate(B, getM(DV, sample(seq(along=DV), replace=TRUE))))

But let's stick with the results from boot.ci() to have a reference.

boots   <- bOut$t                     # estimates from all replicates
M       <- mean(DV)                   # M from original sample
S2M     <- (((N-1)/N) * var(DV)) / N  # S^2(M) from original sample
Mstar   <- boots[ , 1]                # M* for each replicate
S2Mstar <- boots[ , 2]                # S^2*(M) for each replicate
biasM   <- mean(Mstar) - M            # bias of estimator M

The basic, percentile, and $t$-CI rely on the empirical distribution of bootstrap estimates. To get the $\alpha/2$ and $1 - \alpha/2$ quantiles, we find the corresponding indices to the sorted vector of bootstrap estimates (note that boot.ci() will do a more complicated interpolation to find the empirical quantiles when the indices are not natural numbers).

(idx <- trunc((B + 1) * c(0.05/2, 1 - 0.05/2)) # indices for sorted vector of estimates
[1] 25 975

> (ciBasic <- 2*M - sort(Mstar)[idx])          # basic CI
[1] 106.21826  95.65911

> (ciPerc <- sort(Mstar)[idx])                 # percentile CI
[1] 95.42188 105.98103

For the $t$-CI, we need the bootstrap $t^{\star}$ estimates to calculate the critical $t$-values. For the standard normal CI, the critical value will just be the $z$-value from the standard normal distribution.

# standard normal CI with bias correction
> zCrit   <- qnorm(c(0.025, 0.975))   # z-quantiles from std-normal distribution
> (ciNorm <- M - biasM + zCrit * sqrt(var(Mstar)))
[1] 95.5566 106.0043

> tStar <- (Mstar-M) / sqrt(S2Mstar)  # t*
> tCrit <- sort(tStar)[idx]           # t-quantiles from empirical t* distribution
> (ciT  <- M - tCrit * sqrt(S2M))     # studentized t-CI
[1] 106.20690  95.44878

In order to estimate the coverage probabilities of these CI-types, you will have to run this simulation many times. Just wrap the code into a function, return a list with the CI-results and run it with replicate() like demonstrated in this gist.

Solved – Using Bootstrap to estimate confidence interval of the standard deviation

I would say that since sample mean is $\overline{X}=\frac{\sum{X_i}}{n}$, its convergence rate is 1/n, which is also the convergence rate of sample variance. But the convergence rate of sample standard deviation $ 1/\sqrt{n} $.

Best Answer

Related Solutions

Solved – Coverage probabilities of the basic bootstrap confidence Interval

Solved – Using Bootstrap to estimate confidence interval of the standard deviation

Related Question