I wonder if someone knows any general rules of thumb regarding the number of bootstrap samples one should use, based on characteristics of the data (number of observations, etc.) and/or the variables included?
Solved – Rule of thumb for number of bootstrap samples
bootstrapinferencemonte carlo
Related Solutions
Think of the importance sampling as defining a new $f^*(x_i) = f(x_i)w_i/\sum w_i$. Then, when you do your Bayesian bootstrap, as in your formula for $\hat{\theta}_j$, do it with respect to $f^*$ instead of the original $f$.
The concept behind this is that the importance sampling "corrects" for a misspecified probability of observing the $x_i$ (misspecified because it isn't $1/n$.) This correction should be the same regardless of the weights generated for a particular bootstrap sample, because it's due to the underlying probability model. So you shouldn't combine the correction weights with the bootstrap weights to form new weights (your first approach). On the other hand, your second approach adds randomness by sampling according to the importance sampling weights rather than doing the importance sampling calculation, thus adding randomness to your results and causing the resulting distribution to be too wide. Your third approach adds randomness for the same reason, but to a different degree, and therefore has the same effect.
Let's ask the computer to generate some small examples. (The language is R
.)
Take $n=5$. Begin by placing $n-1$ bars (represented as ones) randomly within $n + n-1 = 2n-1$ places; that is, by selecting an $n-1$ element subset of $\{1,2,\ldots,2n-1\}$:
n <- 5
set.seed(17)
y <- rep(0, 2*n-1)
y[sample.int(2*n-1, n-1, replace=FALSE)] <- 1
names(y) <- c("_","|")[y+1]
print(y)
The output is
_ | _ | | _ _ | _
0 1 0 1 1 0 0 1 0
The $n-1=4$ selected elements are shown with bars. Between them appear five boxes of sizes 1, 1, 0, 2, and 1, respectively. These counts can be easily computed:
x <- tabulate(cumsum(c(1,y)))-1
names(x) <- 1:n
print(x)
The output is
1 2 3 4 5
1 1 0 2 1
This tabulation means that 1 "1" was selected, 1 "2", no "3"s, 2 "4"s, and 1 "5". (To appreciate what happened, inspect the intermediate result:
cumsum(c(1,y))
_ | _ | | _ _ | _
1 1 2 2 3 4 4 4 5 5
Beginning with a 1
(which corresponds to none of the boxes or bars), the cumulative sum incremented the value every time a bar was crossed. Because there are $n-1=4$ bars, the final value is $1+(n-1)=n=5$. Thus every possible outcome in $\{1,2,\ldots,n\}$ is named at least once. The code counted the number of times each outcome was mentioned and decremented that count by one.)
We could equally well write the same information as an array of the sample values by replicating each value (in the first row) the number of times indicated (in the second row):
print(z <- unlist(mapply(rep, 1:n, x)))
The output is
[1] 1 2 4 4 5
These are the sample values, sorted for your convenience. Finally, they can be converted back to boxes and bars:
unlist(sapply(tabulate(z), function(i) c(1,rep(0, i))))[-1]
This command creates one box for each sample element and sticks a bar in front of the first box for each new element processed. After removing the initial bar, the output is what we started with (y
).
Because this code goes full circle from one of the three different representations of a given sample back to itself, it shows that any one of the representations can be converted uniquely to any of the other two forms. Therefore all configurations of each of the three forms of representation are in one-to-one correspondence. In particular, the number of possible arrays z
that name the sample members explicitly is the same as the number of ways of creating y
, which (by definition) is $\binom{2n-1}{n-1}$.
Best Answer
My experience is that statisticians won't take simulations or bootstraps seriously unless the number of iterations exceeds 1,000. MC error is a big issue that's a little under appreciated. For instance, this paper used
Niter=50
to demonstrate LASSO as a feature selection tool. My thesis would have taken a lot less time to run had 50 iterations been deemed acceptable! I recommend that you should always inspect the histogram of the bootstrap samples. Their distribution should appear fairly regular. I don't think any plain numerical rule will suffice, and it would be overkill to perform, say, a double-bootstrap to assess MC error.Suppose you were estimating the mean from a ratio of two independent standard normal random variables, some statistician might recommend bootstrapping it since the integral is difficult to compute. If you have basic probability theory under your belt, you would recognize that this ratio forms a Cauchy random variable with a non-existent mean. Any other leptokurtic distribution would require several additional bootstrap iterations compared to a more regular Gaussian density counterpart. In that case, 1000, 100000, or 10000000 bootstrap samples would be insufficient to estimate that which doesn't exist. The histogram of these bootstraps would continue to look irregular and wrong.
There are a few more wrinkles to that story. In particular, the bootstrap is only really justified when the moments of the data generating probability model exist. That's because you are using the empirical distribution function as a straw man for the actual probability model, and assuming they have the same mean, standard deviation, skewness, 99th percentile, etc.
In short, a bootstrap estimate of a statistic and its standard error is only justified when the histogram of the bootstrapped samples appears regular beyond reasonable doubt and when the bootstrap is justified.