Sampling – Determining Bootstrap Sample Sizes

bootstrapresamplingsampling

I'm learning about bootstrapping as a means of estimating the variance of a sample statistic. I have one basic doubt.

Quoting from http://web.stanford.edu/class/psych252/tutorials/doBootstrapPrimer.pdf:

• How many observations should we resample? A good suggestion is the original sample size.

How can we resample as many observations as in the original sample?
If I have a sample size of 100, and I'm trying to estimate the variance of the mean. How can I obtain multiple bootstrap samples of size 100 from a total sample size of 100? Only 1 bootstrap sample would be possible in this case which would be equivalent to the original sample right?

I'm obviously misunderstanding something very basic. I understand that the number of ideal bootstrap samples is always infinite, and to determine the number of bootstrap samples necessary for my data I'd have to test for convergence keeping my required precision in mind.
But I'm really confused about what should be the size of each individual bootstrap sample.

Best Answer

Bootstrap is conducted by sampling with replacement. It seems that the term "with replacement" is unclear for you. As noted by whuber, illustration of sampling with replacement is given on p. 3 of the paper you refer to (reproduced below).

Illustration of sampling with replacement

(source: http://web.stanford.edu/class/psych252/tutorials/doBootstrapPrimer.pdf)

The general idea of sampling with replacement is that any case can be sampled multiple times (green marble on the first image above; blue and violet marbles on the last picture). If you want to imagine yourself this process, think of a bowl filled with colorful marbles. Say that you want to draw some number of marbles from this bowl. If you sampled without replacement, then you would be simply taking the marbles out of the bowl and putting the sampled ones aside. If you sampled with replacement, then you would be sampling the marbles one-by-one, by taking single marble out of the bowl, signing down it's color in your notebook and then returning it back to the bowl. So when sampling with replacement the same marble can be sampled multiple times.

So when sampling without replacement, you can sample only $n$ marbles out of the bowl containing $n$ marbles, while in case of sampling with replacement you can sample any number of marbles (even greater then $n$) from the finite population. If you sampled $n$ out of $n$ marbles without replacement you would end up with exactly the same sample but in shuffled order. If you sampled $n$ out of $n$ marbles with replacement, each time you can possibly sample a different combination of marbles.

There is $n \choose k$ ways of sampling without replacement $k$ cases out of population of size $ n$ and $n+k-1 \choose k$ ways of sampling with replacement. If you want to read more about the math behind it, you can check the 2.1. Combinatorics chapter of Introduction to Probability online handbook by Hossein Pishro-Nik. There is also a handy cheatsheet on WolframMathWorld page.

Related Question