Solved – Rule of thumb for number of bootstrap samples

bootstrapinferencemonte carlo

I wonder if someone knows any general rules of thumb regarding the number of bootstrap samples one should use, based on characteristics of the data (number of observations, etc.) and/or the variables included?

Best Answer

My experience is that statisticians won't take simulations or bootstraps seriously unless the number of iterations exceeds 1,000. MC error is a big issue that's a little under appreciated. For instance, this paper used Niter=50 to demonstrate LASSO as a feature selection tool. My thesis would have taken a lot less time to run had 50 iterations been deemed acceptable! I recommend that you should always inspect the histogram of the bootstrap samples. Their distribution should appear fairly regular. I don't think any plain numerical rule will suffice, and it would be overkill to perform, say, a double-bootstrap to assess MC error.

Suppose you were estimating the mean from a ratio of two independent standard normal random variables, some statistician might recommend bootstrapping it since the integral is difficult to compute. If you have basic probability theory under your belt, you would recognize that this ratio forms a Cauchy random variable with a non-existent mean. Any other leptokurtic distribution would require several additional bootstrap iterations compared to a more regular Gaussian density counterpart. In that case, 1000, 100000, or 10000000 bootstrap samples would be insufficient to estimate that which doesn't exist. The histogram of these bootstraps would continue to look irregular and wrong.

There are a few more wrinkles to that story. In particular, the bootstrap is only really justified when the moments of the data generating probability model exist. That's because you are using the empirical distribution function as a straw man for the actual probability model, and assuming they have the same mean, standard deviation, skewness, 99th percentile, etc.

In short, a bootstrap estimate of a statistic and its standard error is only justified when the histogram of the bootstrapped samples appears regular beyond reasonable doubt and when the bootstrap is justified.

Related Question