Solved – Explanation of confidence interval from R function boot.ci

bootstrapconfidence intervalr

I used boot function in R to do bootstrap for 40 times and used boot.ci to get the "normal" confidence interval. The following is my R code:
1. Define the statistic used in boot function

uni_boot <-function(data,indices,vari){
d = data[indices,]
unifit = coxph(as.formula(paste('Surv(time, status)~', vari))
,data = d)
# return hazard ratio
summary(unifit)$coef[2]
}

2.Bootstrap

r1 <- boot(data = data, statistic = uni_boot, R = 40,
vari = variable)
r2 <- boot.ci(boot.out = r1, type = "norm")

Then, I examine the following things
Result of r1 object:

original bias std. error
t1* 1.053145 0.03274176 0.1714448

Result of r2 object

Intervals :
Level Normal
95% ( 0.684, 1.356 )
Calculations and Intervals on Original Scale

The bootstrap sample mean would be mean(r1$t) 1.085887 which is original plus the bias. However, when I examine the the formula used to calculate the bootstrap interval. It is original-bias-V^1/2*Z(1-alpha) or original-bias-V^1/2*Z(alpha).
I followed the formula and did the calculation:

1.053145-0.03274176+1.96*0.1714448 got me the upper bound 1.356
1.053145-0.03274176-1.96*0.1714448 got me the lower bound 0.684

My understanding of the confidence interval would be the bootstrap mean in the middle of the boot CI. However, the boot CI midpoint here turns to be 1.024. Sometimes, the boot CI from boot.ci wouldn't cover the bootstrap mean mean(r1$t).
Anything I understand wrongly?

Best Answer

Yes, the right way to correct for bias with bootstrapping is to subtract the bias from the value obtained on the original sample. When you get stuck thinking about bootstrapping, remember the guiding principle:

The population is to the sample as the sample is to the bootstrap samples.

Your value from the original sample is 1.053; you perform bootstrap resamples from the original sample and find a mean value of 1.086. The mean from the bootstrap samples was thus 0.033 higher than the value from the original sample: a bias of +0.033.

Applying the above principle, you use that bias value to estimate that the value from the original sample is 0.033 higher than the population value. So the bias-corrected estimate of the population value is 1.02.

The CI based on bootstrapping are supposed to represent the CI in the original population, while the values from the bootstrapped samples themselves are doubly biased from the population values (once from the original sample versus the population, and then again from the bootstrapped samples versus the original sample). With bias, you thus shouldn't be alarmed that the values obtained from the bootstrapped samples are beyond the final CI estimates for the population.

Issues of confidence intervals from bootstrapping can be even trickier; the "normal" confidence intervals assume symmetry about the (bias-corrected) mean, an assumption that might not best describe the situation in practice. With bias and asymmetry some other frequently used bootstrapping-based estimates of confidence intervals can be misleading. I struggled with these issues extensively until I forced myself to apply the above guiding principle systematically.

Related Question