Solved – Can bootstrap be seen as a “cure” for the small sample size

bootstrapsmall-sample

This question has been triggered by something I read in this graduate-level statistics textbook and also (independently) heard during this presentation at a statistical seminar. In both cases, the statement was along the lines of "because the sample size is pretty small, we decided to perform estimation via bootstrap instead of (or along with) this parametric method $X$".

They didn't get into the details, but probably the reasoning was as follows: method $X$ assumes the data follow a certain parametric distribution $D$. In reality the distribution is not exactly $D$, but it's ok as long as the sample size is large enough. Since in this case the sample size is too small, let's switch to the (non-parametric) bootstrap that doesn't make any distributional assumptions. Problem solved!

In my opinion, that's not what bootstrap is for. Here is how I see it: bootstrap can give one an edge when it's more or less obvious that there are enough data, but there is no closed form solution to get standard errors, p-values and similar statistics. A classic example is obtaining a CI for the correlation coefficient given a sample from a bivariate normal distribution: the closed form solution exists, but it is so convoluted that bootstrapping is simpler. However, nothing implies that bootstrap can somehow help one to get away with a small sample size.

Is my perception right?

If you find this question interesting, there is another, more specific bootstrap question from me:

Bootstrap: the issue of overfitting

P.S. I can’t help sharing one egregious example of the “bootstrap approach”. I am not disclosing the author’s name, but he is one of the older generation “quants” who wrote a book on Quantitative Finance in 2004. The example is taken from there.

Consider the following problem: suppose you have 4 assets and 120 monthly return observations for each. The goal is to construct the joint 4-dimensional cdf of yearly returns. Even for a single asset, the task appears hardly attainable with only 10 yearly observations, let alone the estimation of 4-dimensional cdf. But not to worry, the “bootstrap” will help you out: take all of the available 4-dimensional observations, resample 12 with replacement and compound them to construct a single “bootstrapped” 4-dimensional vector of annual returns. Repeat that 1000 times and, lo and behold, you got yourself a “bootstrap sample” of 1000 annual returns. Use this as an i.i.d. sample of size 1000 for the purpose of cdf estimation, or any other inference that can be drawn from a thousand –year history.

Best Answer

I remember reading that using the percentile confidence interval for bootstrapping is equivalent to using a Z interval instead of a T interval and using $n$ instead of $n-1$ for the denominator. Unfortunately I don't remember where I read this and could not find a reference in my quick searches. These differences don't matter much when n is large (and the advantages of the bootstrap outweigh these minor problems when $n$ is large), but with small $n$ this can cause problems. Here is some R code to simulate and compare:

simfun <- function(n=5) {
    x <- rnorm(n)
    m.x <- mean(x)
    s.x <- sd(x)
    z <- m.x/(1/sqrt(n))
    t <- m.x/(s.x/sqrt(n))
    b <- replicate(10000, mean(sample(x, replace=TRUE)))
    c( t=abs(t) > qt(0.975,n-1), z=abs(z) > qnorm(0.975),
        z2 = abs(t) > qnorm(0.975), 
        b= (0 < quantile(b, 0.025)) | (0 > quantile(b, 0.975))
     )
}

out <- replicate(10000, simfun())
rowMeans(out)

My results for one run are:

     t      z     z2 b.2.5% 
0.0486 0.0493 0.1199 0.1631

So we can see that using the t-test and the z-test (with the true population standard deviation) both give a type I error rate that is essentially $\alpha$ as designed. The improper z test (dividing by sample standard deviation, but using Z critical value instead of T) rejects the null more than twice as often as it should. Now to the bootstrap, it is rejecting the null 3 times as often as it should (looking if 0, the true mean, is in the interval or not), so for this small sample size the simple bootstrap is not sized properly and therefore does not fix problems (and this is when the data is optimally normal). The improved bootstrap intervals (BCa etc.) will probably do better, but this should raise some concern about using bootstrapping as a panacea for small sample sizes.

Related Solutions

Solved – Calculating necessary sample size using bootstrap

Ok, so this answer might not be exactly what you were after based on the detail of your question, but I stumbled across your question based on just the title and so this might help other people who also come across it in a similar fashion.

The only way I know of determining sample size using a bootstrap is via a power analysis approach. That is you:

State the null hypothesis and alternative hypothesis
State the alpha level (typically 5%)
If necessary shift the pilot study data so that you know the null hypothesis is false
Re-sample with replacements from the pilot study
Perform the test on the this sample and record the result
Repeat 1000 or so times to build up probability distribution
Count how many times the null hypothesis is rejected

With many possible "variations on a theme of..."

And that gives you the statistical power (for that sample size and that particular test), because the definition of statistical power is "probability that the test will reject the null hypothesis when the alternative hypothesis is true". So you can then vary the sample size until you achieve the desired power.

Here's an approach in R that I did based on this paper, Sample Size / Power Considerations, by Elizabeth Colantuoni.

I had two groups of non-normal, non-parametric data. A pilot study of each showed them to have differing medians and a Mann Whitney Wilcoxon test rejected the null hypothesis that they were the same, but I wanted to determine the sample size required so I could say this for "sure". Since the test already rejected the null hypothesis on the pilot data I did not see any need to shift or manipulate the data to ensure the alternative hypothesis was true.

power = function(group1.pilot, group2.pilot, reps=1000, size=10) {
    results  <- sapply(1:reps, function(r) {
        group1.resample <- sample(group1.pilot, size=size, replace=TRUE) 
        group2.resample <- sample(group2.pilot, size=size, replace=TRUE) 
        test <- wilcox.test(group1.resample, group2.resample, paired=FALSE)
        test$p.value
    })
    sum(results<0.05)/reps
}

#Find power for a sample size of 100
power(data1, data2, reps=1000, size=100)

Necessary disclaimer: I'm not a statistician and I'm still learning about bootstrapping so feedback, corrections and pointing and laughing are welcome.

Solved – How to use OLS or NLS on very small sample size

Your problem is not new and a whole chapter ("High-Dimensional Problems") of this book is dedicated to such cases where the number of variables $p$ is much bigger than the number of observations $N$. Numerous ways are possible.

In my opinion, the two simplest methods to regularize the problem are the Lasso and the Ridge Regression which consist in adding a penalty to the standard least-square term.

Best Answer

Related Solutions

Solved – Calculating necessary sample size using bootstrap

Solved – How to use OLS or NLS on very small sample size

Related Question