[Math] Stratified Sampling for Variance Reduction–Need Intuition as to Why it Works

monte carlosamplingvariance

When working on variance reduction techniques, I was studying stratified sampling.

Suppose we wanted to estimate a definite integral, and we decided to do so using classical Monte Carlo.

It can be shown that stratified sampling reduces the overall variance of our estimator, but I don't see intuitively why this is true.

In classical Monte Carlo, we sample points from the function, and then take the average.

In stratified sampling, we partition the interval into strata, collect samples from each stratum, and then combine our results.

So my question is how does stratified sampling reduce variance? I can kind of see the variance being smaller on a particular stratum, but I don't see how the sum of these estimates yields an overall lower variance.

Best Answer

Assume you want to estimate the average height of the $100$ billions humans living in another planet. You know that there are $60\%$ female and $40\%$ male in this planet. What you do not know is that all females are exactly $10$ feet tall while all the males are $5$ feet tall. You do suspect, a priori, that the heights is sensitive to the gender.

Your computer only allows you to compute the average of the heights of $1000$ people. Once. Choose wisely.

Stratification sampling tells you that it is less risky (lower variance estimator) to ask the heights of 600 females and 400 males instead of asking 1000 random people that could equally likely be male or female.

The average over a stratified sample always gives you the true mean 8 feet. The purely random sample gives you a estimator which will be a random number between 5 feet and 10 feet depending on the proportion of male and female you get. Of course this second estimator will converge to 8 feet as you increase the sample size (Monte Carlo iteration) because you will eventually be more and more likely to have accurate proportions in your sample (Law of large number) with a controlled error rate (Central limit theorem).

Add some variance in the height of the female and male and the result is less extreme but the conclusion remains the same.

Related Question