Bootstrap Confidence Intervals – Classical Confidence Intervals vs. Bootstrap Confidence Intervals

bootstrapconfidence intervalcorrelationsimulation

Suppose I have some data that includes height and weight measurements for 1000 people – I am interested in calculating the Correlation Coefficient to see if there exists some correlation between height and weight, and if this correlation is statistically significant.

I was curious in learning more about how the Confidence Intervals of the Correlation Coefficient is calculated. When reading about this online, I found some links which included something called the "Fisher Transform" and outlined (what seemed to me as) a complicated procedure for calculating the Confidence Interval of the Correlation Coefficient.

This got me thinking about the Bootstrap Procedure. Suppose I took performed "Random Sampling With Replacement" and made 1000 draws from the data I have, and then calculated the Correlation Coefficient. Now, imagine I repeat this process 1000 times and produce a list of 1000 Correlation Coefficients calculated using random draws from this data. Could I not then find 5th and the 95th quantile and use these as a pseudo confidence interval?

Although I have feeling that this might work, I am not sure if this is a statistically valid approach. Is it possible that using the "classical" formulas for the Confidence Intervals of the Correlation Coefficient would be "more realistic and better suited" compared to this "bootstrap approach"?

Thank you!

Notes: CLT-based confidence interval vs. Bootstrap based confidence interval

Best Answer

Yes, you can bootstrap the correlation coefficient and get the confidence intervals you are looking for but:

you should random-sample joint observations (couples of observations i.e. weight,height) and not independently sampling from weight and height.

Even if this makes sense, I may suggest a different approach:

  1. Fit a linear model (for example $\text{weight} = \alpha + \beta \text{height}$);
  2. Estimate the residuals;
  3. Bootstrap the residuals with replacement and calculate bootstrapped fitted values;
  4. Calculate correlation between the bootstrapped fitted values and the independent variable (height);
  5. Repeat point 4 and 5 N times to get the confidence intervals for the correlation coefficient.
Related Question