Solved – Why bootstrapping

bootstrap

I understood that bootstrapping is a technique used to estimate statistics of a population. In bootstrapping we take many samples of chosen size, estimate statistics and obtain the mean of these statistics. This mean is representative of the whole population.

My doubt is that why take the samples in the first place. If you have the whole population with you, calculate the statistics on the whole for which you get 100% accurate statistics?

Best Answer

Welcome to CV!

In bootstrapping, you repeatedly take samples with replacement from the original sample. The general idea behind this is that if you can estimate the uncertainty in your sample by asking the question: What if I didn't observe this observation, or that one, or if I observed this observation more than once?

You do this, say, $B = 1,000$ times, and end up with $1,000$ slightly different estimates of your statistic of interest. Depending on how strongly the calculated statistic is affected by this, the variance of your bootstrapped statistic will be larger.

In fact, it turns out that the standard deviation of the bootstrapped statistic can be a really good estimator of the standard error of your statistic.

And so, by simply randomly resampling our original sample with replacement, over and over, we have obtained an idea of how precise the estimate is, given that we only have a sample of the population.

Of course, if you can measure the entire population, then there is no point in bootstrapping.