Solved – Can bootstrap be seen as a “cure” for the small sample size

bootstrapsmall-sample

This question has been triggered by something I read in this graduate-level statistics textbook and also (independently) heard during this presentation at a statistical seminar. In both cases, the statement was along the lines of "because the sample size is pretty small, we decided to perform estimation via bootstrap instead of (or along with) this parametric method $X$".

They didn't get into the details, but probably the reasoning was as follows: method $X$ assumes the data follow a certain parametric distribution $D$. In reality the distribution is not exactly $D$, but it's ok as long as the sample size is large enough. Since in this case the sample size is too small, let's switch to the (non-parametric) bootstrap that doesn't make any distributional assumptions. Problem solved!

In my opinion, that's not what bootstrap is for. Here is how I see it: bootstrap can give one an edge when it's more or less obvious that there are enough data, but there is no closed form solution to get standard errors, p-values and similar statistics. A classic example is obtaining a CI for the correlation coefficient given a sample from a bivariate normal distribution: the closed form solution exists, but it is so convoluted that bootstrapping is simpler. However, nothing implies that bootstrap can somehow help one to get away with a small sample size.

Is my perception right?

If you find this question interesting, there is another, more specific bootstrap question from me:

Bootstrap: the issue of overfitting

P.S. I can’t help sharing one egregious example of the “bootstrap approach”. I am not disclosing the author’s name, but he is one of the older generation “quants” who wrote a book on Quantitative Finance in 2004. The example is taken from there.

Consider the following problem: suppose you have 4 assets and 120 monthly return observations for each. The goal is to construct the joint 4-dimensional cdf of yearly returns. Even for a single asset, the task appears hardly attainable with only 10 yearly observations, let alone the estimation of 4-dimensional cdf. But not to worry, the “bootstrap” will help you out: take all of the available 4-dimensional observations, resample 12 with replacement and compound them to construct a single “bootstrapped” 4-dimensional vector of annual returns. Repeat that 1000 times and, lo and behold, you got yourself a “bootstrap sample” of 1000 annual returns. Use this as an i.i.d. sample of size 1000 for the purpose of cdf estimation, or any other inference that can be drawn from a thousand –year history.

Best Answer

I remember reading that using the percentile confidence interval for bootstrapping is equivalent to using a Z interval instead of a T interval and using $n$ instead of $n-1$ for the denominator. Unfortunately I don't remember where I read this and could not find a reference in my quick searches. These differences don't matter much when n is large (and the advantages of the bootstrap outweigh these minor problems when $n$ is large), but with small $n$ this can cause problems. Here is some R code to simulate and compare:

simfun <- function(n=5) {
    x <- rnorm(n)
    m.x <- mean(x)
    s.x <- sd(x)
    z <- m.x/(1/sqrt(n))
    t <- m.x/(s.x/sqrt(n))
    b <- replicate(10000, mean(sample(x, replace=TRUE)))
    c( t=abs(t) > qt(0.975,n-1), z=abs(z) > qnorm(0.975),
        z2 = abs(t) > qnorm(0.975), 
        b= (0 < quantile(b, 0.025)) | (0 > quantile(b, 0.975))
     )
}

out <- replicate(10000, simfun())
rowMeans(out)

My results for one run are:

     t      z     z2 b.2.5% 
0.0486 0.0493 0.1199 0.1631 

So we can see that using the t-test and the z-test (with the true population standard deviation) both give a type I error rate that is essentially $\alpha$ as designed. The improper z test (dividing by sample standard deviation, but using Z critical value instead of T) rejects the null more than twice as often as it should. Now to the bootstrap, it is rejecting the null 3 times as often as it should (looking if 0, the true mean, is in the interval or not), so for this small sample size the simple bootstrap is not sized properly and therefore does not fix problems (and this is when the data is optimally normal). The improved bootstrap intervals (BCa etc.) will probably do better, but this should raise some concern about using bootstrapping as a panacea for small sample sizes.