Sampling Techniques – Why Larger Sample Sizes Are Needed for Monte Carlo Simulations Compared to Surveys

monte carlosamplingsurvey-sampling

I’m a bit of a stats-noob, so I am not sure I will manage to formulate this question properly, but let me do my best.

I‘m trying to develop an intuition for sample sizes and when they are sufficient to get to a reasonable degree of representativeness of a larger population.

I know that when doing surveys or polls, a sample size of mere hundreds or thousands is often sufficient, even for very large populations. (This is at least what calculators such as this one are telling me.)

So now what happens if we extend this notion to Monte Carlo simulations of a multi-parameter model? Every simulation can be thought of as representing a sample of the larger population of all possible simulations (the solution space).

The solution space of a multi-dimensional problem can be very large, but the survey/poll scenario tells me that this doesn’t mean that the sample size will need to be large.

This seems to suggest that any application of the Monte Carlo method can be concluded within several hundred/thousand of simulations as well. But I feel this cannot be true, as it would appear to defeat the purpose of the plethora of “more efficient alternatives” to Monte Carlo (such as Markov Chain Monte Carlo).

What am I missing? What is the difference between surveying a large population of people on their election vote (for example) and “surveying” a large population of parameter permutations on their performance (when plugged into a certain model)?

Why does a surprisingly small number of surveys suffice to get a decent idea of the likely outcome of an election, while a small number of Monte Carlo simulations does not suffice (I think) to get a decent idea of the likely (top) performance of a model?

My apologies for needing too many words, but it’s the best I can do I am afraid. I hope you catch my drift regardless and can steer me in the right direction. Thanks a lot!

Best Answer

I know that when doing surveys or polls, a sample size of mere hundreds or thousands is often sufficient, even for very large populations.

In the calculator you linked to, it only considers estimating a single proportion. The relationship between sample size, confidence level, and desired margin of error is simple for estimating a single proportion from iid binomial data. Political polls, at least, tend to be focused on just a few proportions (what % of voters favor candidate A over candidate B). If they assume random sampling and ignorable nonresponse, they can use such a calculator to find that, say, ~1000 respondents will get you a 95% MOE of $\pm$ 3 percentage points. Even if you wanted to do multiple comparisons corrections (though they often don't) for reporting a handful of proportions at once, ~1000 respondents is typically still good enough for reasonably narrow MOEs.

So now what happens if we extend this notion to Monte Carlo simulations of a multi-parameter model?

If you're studying a higher-dimensional space, you need more data if you want to honestly account for the uncertainty in studying many estimates at once. If your estimators have some complicated intractable distribution, sample-size calculations may be sketchy and you want to err on the side of more data. If you want to characterize the space in more detail than just a single (multi-dimensional) point estimate, you need more data.

In fact... this is true of surveys as well. There are plenty of much larger surveys, such as the ones that are run by national statistical offices (such as the US Census Bureau). Some ask simple binary questions where the above sample-size calculator works, but they want bigger samples in order to account for asking many such questions. Other questions are quantitative measurements and might need a different approach to estimate the sample size. They also often need bigger sample sizes in order to get precise sub-group estimates (think small geographic regions or small demographic groups). So just a few hundred responses is not always enough. For instance, the American Community Survey collects roughly 3 million responses each year.

Related Solutions

Solved – Bootstrap, Monte Carlo

This confusion between bootstrap procedures and Monte Carlo procedures keeps recurring, so perhaps this is as good a place as any to address it. (The examples of R code may help with the homework, too.)

Consider this implementation of the bootstrap in R:

boot <- function(x, t) { # Exact bootstrap of procedure t on data x
    n <- length(x)       # Must lie between 2 and 7 inclusive.
    if (n > 7) {
        stop("Sample size exceeds 7; use an approximate method instead.")
    }
    p <- c(n, 1:(n-1))
    a <- rep(x, n^(n-1))
    dim(a) <- rep(n, n)
    y <- as.vector(a)
    while (n > 1) {
        n <- n-1
        a <- aperm(a, p)
        y <- cbind(as.vector(a), y)
    }
    apply(y, 1, t)
}

A quick look will confirm that this is a deterministic calculation: no random values are generated or used. (I will leave the details of its inner workings for interested readers to figure out for themselves.)

The arguments to boot are a batch of numeric data in the array x and a reference t to a function (that can be applied to arrays exactly like x) to return a single numeric value; in other words, t is a statistic. It generates all possible samples with replacement from x and applies t to each of them, thereby producing one number for each such sample: that's the bootstrap in a nutshell. The return value is an array representing the exact bootstrap distribution of t for the sample x.

As a tiny example, let's bootstrap the mean for a sample x = c(1,3):

> boot(c(1,3), mean)
> [1] 1 2 2 3

There are indeed four possible samples of size $2$ with replacement from $(1,3)$; namely, $(1,1)$, $(1,3)$, $(3,1)$, and $(3,3)$. boot generates them all (in the order just listed) and applies t to each of them. In this case t computes the mean and those turn out to be $1$, $2$, $2$, and $3$, respectively, as shown in the output.

Where you go from here depends on how you want to use the bootstrap. The full information about the bootstrap is contained in this output array, so it's usually a good idea to display it. Here is an example where the standard deviation is bootstrapped from the sample $(1,3,3,4,7)$:

hist(boot(c(1,3,3,4,7), sd))

Histogram of the SD

Now we are prepared to talk about Monte Carlo simulation. Suppose, say, we were going to bootstrap a 95% upper confidence limit on the SD from a sample of $5$ by using the upper 95th percentile of its bootstrap distribution. What properties would this procedure have? One way to find out is to suppose the sample were obtained randomly from, say, a uniform distribution. (The application will often indicate what a reasonable distributional assumption may be; here, I arbitrarily chose one that is simple for computation but not easy to deal with analytically.) We can simulate what happens by taking such a sample and computing the UCL:

> set.seed(17)
> quantile(boot(runif(5, min=0, max=10), sd), .95)[1]
     95% 
3.835870

The result for this particular random sample is 3.83587. This is definite: were you to call boot again with the same set of data, the answer would be exactly the same. But how might the answer change with different random samples? Find out by repeating this process a few times and drawing a histogram of the results:

> boot.sd <- replicate(100, quantile(boot(runif(5, min=0, max=10), sd), .95)[1])
> hist(boot.sd)

Histogram of simulations

Were we to do another set of simulations, the random draws would come out different, creating a (slightly) different histogram--but not very different from this one. We can use it with some confidence to understand how the bootstrap UCL of the SD is working. For reference, notice that the standard deviation of a uniform distribution (spanning the range from $0$ to $10$ as specified here) equals $10/\sqrt{12} \approx 2.887$. As one would hope for any UCL worth its salt, the majority (three-quarters, or 0.75) of the values in the histogram exceed this:

> length(boot.sd[boot.sd >= 10/sqrt(12)]) / length(boot.sd)
[1] 0.75

But that's nowhere near the nominal 95% we specified (and were hoping for)! This is one value of simulation: it compares our hopes to what is really going on. (Why the discrepancy? I believe it is because bootstrapping an SD does not work well with really small samples.)

Review

Bootstrap statistics are conceptually just the same as any other statistic like a mean or standard deviation; they just tend to take a long time to compute. (See the warning message in the boot code!)
Monte-Carlo simulation can be useful for studying how a bootstrap statistic varies due to randomness in obtaining samples. The variation observed in such simulation is due to variation in the samples, not variation in the bootstrap.
(Not illustrated here) Because bootstrap statistics can take a lot of computation (apparently, up to $n^n$ calculations for samples of size $n$), it is convenient to approximate the bootstrap distribution. This is usually done by creating a "black box" program to obtain one value at random from the true bootstrap distribution and calling that program repeatedly. The collective output approximates the exact distribution. The approximation can vary due to randomness in the black box--but that variation is an artifact of the approximation procedure. It is not (conceptually) inherent in the bootstrap procedure itself.

Best Answer

Related Solutions

Solved – Bootstrap, Monte Carlo

Review

Related Question