Solved – Determining sample size necessary for bootstrap method / Proposed Method

bootstrapmethodologysample-size

I know this is a rather hot topic where no one really can give a simple answer for. Nevertheless I am wondering if the following approach couldn’t be useful.

The bootstrap method is only useful if your sample follows more or less (read exactly) the same distribution as the original population. In order to be certain this is the case you need to make your sample size large enough. But what is large enough?

If my premise is correct you have the same problem when using the central limit theorem to determine the population mean. Only when your sample size is large enough you can be certain that the population of your sample means is normally distributed (around the population mean). In other words, your samples need to represent your population (distribution) well enough. But again, what is large enough?

In my case (administrative processes: time needed to finish a demand vs amount of demands) I have a population with a multi-modal distribution (all the demands that are finished in 2011) of which I am 99% certain that it is even less normally distributed than the population (all the demands that are finished between present day and a day in the past, ideally this timespan is as small as possible) I want to research.

My 2011 population exists out of enough units to make $x$ samples of a sample size $n$.
I choose a value of $x$, suppose $10$ ($x=10$). Now I use trial and error to determine a good sample size. I take an $n=50$, and see if my sample mean population is normally distributed by using Kolmogorov-Smirnov. If so I repeat the same steps but with a sample size of $40$, if not repeat with a sample size of $60$ (etc.).

After a while I conclude that $n=45$ is the absolute minimum sample size to get a more or less good representation of my 2011 population. Since I know my population of interest (all the demands that are finished between present day and a day in the past) has less variance I can safely use a sample size of $n=45$ to bootstrap. (Indirectly, the $n=45$ determines the size of my timespan: time needed to finish $45$ demands.)

This is, in short, my idea. But since I am not a statistician but an engineer whose statistics lessons took place in the days of yonder I cannot exclude the possibility I just generated a lot of rubbish :-). What do you guys think? If my premise makes sense, do I need to chose an $x$ larger than $10$, or smaller? Depending on your answers (do I need to feel embarrassed or not? 🙂 I'll be posting some more discussion ideas.

response on first answer
Thanks for replying, Your answer was very useful to me especially the book links.
But I am afraid that in my attempt to give information I completely clouded my question. I know that the bootstrap samples take over the distribution of the population sample. I follow you completely but…

Your original population sample needs to be large enough to be moderately certain that the distribution of your population sample corresponds (equals) with the 'real' distribution of the population.

This is merely an idea on how to determine how large your original sample size needs to be in order to be reasonably certain that the sample distribution corresponds with the population distribution.

Suppose you have a bimodal population distribution and one top is a lot larger than the other one. If your sample size is 5 the chance is large that all 5 units have a value very close to the large top (chance to ad randomly draw a unit there is the largest). In this case your sample distribution will look unimodal.

With a sample size of a hundred the chance that your sample distribution is also bimodal is a lot larger!! The trouble with bootstrapping is that you only have one sample (and you build further on that sample). If the sample distribution really does not correspond with the population distribution you are in trouble. This is just an idea to make the chance of having 'a bad sample distribution' as low as possible without having to make your sample size infinitely large.

Best Answer

I took interest in this question because I saw the word bootstrap and I have written books on the bootstrap. Also people often ask "How many bootstrap samples do I need to get a good Monte Carlo approximation to the bootstrap result?" My suggested answer to that question is to keep increasing the size until you get convergence. No one number fits all problems.

But that is apparently not that question you are asking. You seem to be asking what the original sample size needs to be for the bootstrap to work. First of all I do not agree with your premise. The basic nonparametric bootstrap assumes that the sample is taken at random from a population. So for any sample size $n$ the distribution for samples chosen at random is the sampling distribution assumed in bootstrapping. The bootstrap principle says that choosing a random sample of size $n$ from the population can be mimicked by choosing a bootstrap sample of size $n$ from the original sample. Whether or not the bootstrap principle holds does not depend on any individual sample "looking representative of the population". What it does depend on is what you are estimating and some properties of the population distribution (e.g., this works for sampling means with population distributions that have finite variances, but not when they have infinite variances). It will not work for estimating extremes regardless of the population distribution.

The theory of the bootstrap involves showing consistency of the estimate. So it can be shown in theory that it works for large samples. But it can also work in small samples. I have seen it work for classification error rate estimation particularly well in small sample sizes such as 20 for bivariate data.

Now if the sample size is very small---say 4---the bootstrap may not work just because the set of possible bootstrap samples is not rich enough. In my book or Peter Hall's book this issue of too small a sample size is discussed. But this number of distinct bootstrap samples gets large very quickly. So this is not an issue even for sample sizes as small as 8. You can take a look at these references:

My book: Bootstrap Methods: A Guide for Practitioners and Researchers
Hall's book: The Bootstrap and Edgeworth Expansion

Related Solutions

Solved – Calculating necessary sample size using bootstrap

Ok, so this answer might not be exactly what you were after based on the detail of your question, but I stumbled across your question based on just the title and so this might help other people who also come across it in a similar fashion.

The only way I know of determining sample size using a bootstrap is via a power analysis approach. That is you:

State the null hypothesis and alternative hypothesis
State the alpha level (typically 5%)
If necessary shift the pilot study data so that you know the null hypothesis is false
Re-sample with replacements from the pilot study
Perform the test on the this sample and record the result
Repeat 1000 or so times to build up probability distribution
Count how many times the null hypothesis is rejected

With many possible "variations on a theme of..."

And that gives you the statistical power (for that sample size and that particular test), because the definition of statistical power is "probability that the test will reject the null hypothesis when the alternative hypothesis is true". So you can then vary the sample size until you achieve the desired power.

Here's an approach in R that I did based on this paper, Sample Size / Power Considerations, by Elizabeth Colantuoni.

I had two groups of non-normal, non-parametric data. A pilot study of each showed them to have differing medians and a Mann Whitney Wilcoxon test rejected the null hypothesis that they were the same, but I wanted to determine the sample size required so I could say this for "sure". Since the test already rejected the null hypothesis on the pilot data I did not see any need to shift or manipulate the data to ensure the alternative hypothesis was true.

power = function(group1.pilot, group2.pilot, reps=1000, size=10) {
    results  <- sapply(1:reps, function(r) {
        group1.resample <- sample(group1.pilot, size=size, replace=TRUE) 
        group2.resample <- sample(group2.pilot, size=size, replace=TRUE) 
        test <- wilcox.test(group1.resample, group2.resample, paired=FALSE)
        test$p.value
    })
    sum(results<0.05)/reps
}

#Find power for a sample size of 100
power(data1, data2, reps=1000, size=100)

Necessary disclaimer: I'm not a statistician and I'm still learning about bootstrapping so feedback, corrections and pointing and laughing are welcome.

Solved – Can bootstrap be seen as a “cure” for the small sample size

I remember reading that using the percentile confidence interval for bootstrapping is equivalent to using a Z interval instead of a T interval and using $n$ instead of $n-1$ for the denominator. Unfortunately I don't remember where I read this and could not find a reference in my quick searches. These differences don't matter much when n is large (and the advantages of the bootstrap outweigh these minor problems when $n$ is large), but with small $n$ this can cause problems. Here is some R code to simulate and compare:

simfun <- function(n=5) {
    x <- rnorm(n)
    m.x <- mean(x)
    s.x <- sd(x)
    z <- m.x/(1/sqrt(n))
    t <- m.x/(s.x/sqrt(n))
    b <- replicate(10000, mean(sample(x, replace=TRUE)))
    c( t=abs(t) > qt(0.975,n-1), z=abs(z) > qnorm(0.975),
        z2 = abs(t) > qnorm(0.975), 
        b= (0 < quantile(b, 0.025)) | (0 > quantile(b, 0.975))
     )
}

out <- replicate(10000, simfun())
rowMeans(out)

My results for one run are:

     t      z     z2 b.2.5% 
0.0486 0.0493 0.1199 0.1631

So we can see that using the t-test and the z-test (with the true population standard deviation) both give a type I error rate that is essentially $\alpha$ as designed. The improper z test (dividing by sample standard deviation, but using Z critical value instead of T) rejects the null more than twice as often as it should. Now to the bootstrap, it is rejecting the null 3 times as often as it should (looking if 0, the true mean, is in the interval or not), so for this small sample size the simple bootstrap is not sized properly and therefore does not fix problems (and this is when the data is optimally normal). The improved bootstrap intervals (BCa etc.) will probably do better, but this should raise some concern about using bootstrapping as a panacea for small sample sizes.

Best Answer

Related Solutions

Solved – Calculating necessary sample size using bootstrap

Solved – Can bootstrap be seen as a “cure” for the small sample size

Related Question