R – How to Compute Bias-Corrected Percentile Confidence Intervals Using R

bias correctionbootstrapconfidence intervalestimationr

I'm trying to estimate bias-corrected percentile (BCP) confidence intervals in R on a vector from a simple for loop used for resampling. I am primarily looking for help implementing the calculation on a vector in R.

I am attempting to follow steps in Manly. 1998. Randomization, bootstrap and monte carlo methods in biology. 2nd edition. Pg. 48. PDF of the approach/steps should be available here: https://wyocoopunit.box.com/s/9vm4vgmbx5h7um809bvg6u7wr392v6i9

I aware of boot::boot but am hoping to avoid it for the current analysis I am working on (the boot function became quite challenging, for me, for a few reasons). I did post this question to R-help a little over a day ago but got no responses, presumably because it is too stats focused, so I'm reposting in what is probably a more appropriate forum.

I cannot figure out where I'm going wrong but the estimates from my attempt at the BCP CI are different enough from other methods that I assume I'm doing something wrong.

1) Bootstrap 95% CI for R-Squared via boot::boot

require(boot)
data("mtcars")

# Function for boot
rsq <- function(formula, data, indices) {
                  d <- data[indices,]
                  fit <- lm(formula, data=d)
                  return(summary(fit)$r.square) }

# Bootstrap with 1000 replications and estimate CI's
results <- boot(data=mtcars, statistic=rsq, R=1000, formula = mpg ~ wt + disp) 
rsq.ci <- boot.ci(results, type="all")

2) Bootstrap via for loop and estimate bias-corrected pecentile CI's

iterations <- 1000
rsq.out <- vector()

for(i in 1:iterations){
      boot.df <- mtcars[sample(nrow(mtcars), nrow(mtcars), replace = TRUE), ]
      boot.lm <- lm(mpg ~ wt + disp, boot.df)
      boot.rsq <- summary(boot.lm)$r.squared
      rsq.out <- c(rsq.out, boot.rsq) }

hist(rsq.out)

Fit LM with all the data (need for BCP steps)

lm.obs <- lm(mpg ~ wt + disp, mtcars)

Bias-corrected percentile CI
Step 1: Proportion of times that the boot estimate exceeds observed estimate (obviously are many ways to calculate this, mine is clearly not the most succint).

mtcar.boot.rsq <- data.frame(boot.rsq = rsq.out, obs.rsq = summary(lm.obs)$r.squared)

# If boot mean is > observed mean, code 1, otherwise 0.
mtcar.boot.rsq$boot.high <- ifelse(mtcar.boot.rsq$boot.rsq > mtcar.boot.rsq$obs.rsq, 1, 0)

# Mean is the proportion of times boot mean > obs mean
mean(mtcar.boot.rsq$boot.high)
head(mtcar.boot.rsq)

Step 2: Use proportion to get Z score, then use that to calculate the new bias-correct Z score to look up the new proportion to use in quantile()

 rsq.bc <- quantile(mtcar.boot.rsq$boot.rsq,
                       c(pnorm((2*pnorm(mean(mtcar.boot.rsq$boot.high))) - 1.96),
                         pnorm((2*pnorm(mean(mtcar.boot.rsq$boot.high))) + 1.96)))

Estimates

Boot package

rsq.ci
Intervals :
Level      Normal              Basic
95%   ( 0.6690,  0.8659 )   ( 0.6818,  0.8824 )

Level     Percentile            BCa
95%   ( 0.6795,  0.8800 )   ( 0.6044,  0.8511 )

Simple percentile CI on for loop output

quantile(rsq.out, c(0.025, 0.975))
     2.5%     97.5%
0.6871754 0.8809823

My attempt at BCP CI on for loop output (ehhhh)

rsq.bc
31.59952% -- 0.7747525
99.97103% -- 0.9034800

I appreciate any advice!

Best Answer

You almost had it. Change your Step 2 code as shown below. You want the value of Z associated with the proportion you computed in Step 1 and that is what qnorm will give you.

 rsq.bc <- quantile(mtcar.boot.rsq$boot.rsq,
                       c(pnorm((2*pnorm(mean(mtcar.boot.rsq$boot.high))) - 1.96),
                         pnorm((2*pnorm(mean(mtcar.boot.rsq$boot.high))) + 1.96)))

becomes

 rsq.bc <- quantile(mtcar.boot.rsq$boot.rsq,
                       c(pnorm((2*qnorm(mean(mtcar.boot.rsq$boot.high))) - 1.96),
                         pnorm((2*qnorm(mean(mtcar.boot.rsq$boot.high))) + 1.96)))

You might find this page helpful: http://influentialpoints.com/Training/bootstrap_confidence_intervals.htm#bias

Best Answer

Related Solutions

Confidence Interval – Calculating Confidence Intervals via Bootstrap on Dependent Observations

Using Bootstrap to Obtain Sampling Distribution of 1st-Percentile – Methods Explained

Related Question