Solved – Confidence Interval for Sampling of Non-Normal Distribution

confidence intervaldistributionsnormal distribution

As everybody, when sampling a distribution, I was told that the population average can be estimated to be within the range of:

$$\mu \in \bar{x} \pm Z_{\text{conf}}\frac{s}{\sqrt{n}}$$

where $Z_{\text{conf}}=1.96$ for a confidence of 95%.

What I do not remember is whether this assumes the population data is Normally distributed, or just the sampling errors.

Below is a simulation (in R) of sampling of an Exponential distribution (and I have tried others), and this formula seems to work very well for non-Normal distribution. But I just want to be sure.

correct <- 0
pop_mean <- 800
nsim <- 100000
for(i in 1:nsim) {
    n <- 50
    s <- rexp(n, 1/pop_mean)
    mu <- mean(s)
    stderr <- sd(s) / sqrt(n)
    correct <- correct + (mu-1.96*stderr <= pop_mean & pop_mean <= mu+1.96*stderr)
}
print(correct/nsim)  # should give 0.95

Best Answer

Short version: This will work with non-Normal data, provided the sample size is large enough.

Longer version: Your population is summarized by the random variable $X$. However, we aren't conducting inference on $X$ - we're conducting inference on $\bar{X}$. That is, we are relying on the sampling distribution of the sample mean. As you may remember from any class or textbook, this relies on something called the Central Limit Theorem. The CLT basically states that the distribution of $\bar{X}$ is exactly Normal if $X$ is exactly Normal, and is asymptotically (approximately for large sample sizes) Normal when $X$ is not Normal. Thus, as long as your sample size $n$ is large (most references use $n\approx 30$ as a threshold), then $\bar{X}$ will be approximately Normal which allows us to use $Z$ and the Normal distribution to generate confidence intervals.

Related Question