I applied a bootstrap-process to calculate confidence intervalls for the paramters of a multiple lineare regression.
In R it's pretty simple to implement (functions: 'boot' and 'boot.ci') but I still have two comprehension problems:
- Why does it make sense to perform a bootstrap procedure before calculating the confidence intervals? Will they be more precise? And if so, can anyone explain why?
- How can I decide which number of replications is a good number for calculating confidence intervalls? 100? 1000? 10000?
I would really appreciate any help!
Best Answer
You can calculate bootstrap confidence intervals for complex situations, i.e. properties ("statistics") that are not easily accessible analytically. I'm thinking of things like bootstrapping generalization error of a predictive model*.
In other words, bootstrapping may still be possible in situations where you have no good assumption which distribution to base your confidence intervals on.
The choice parametric (analytical confidence interval based on known distribution) vs. non-parametric bootstrap is a trade-off:
@MartenBuuis already gave you some idea how to approach this question. Here's another, very pragmatic one:
If they are not sufficiently precise, fuse the 10x100 calculations, go back to step 1 and 2 with nboot = 1000 replications.
You get the idea.