[Math] How to estimate the range of a normal distribution when the mean and standard deviation are given

normal distributionstandard deviation

For example, how would you respond to this question?

The earnings of one-hundred workers in a company are normally distributed. If the mean of this data set is 24 and the standard deviation is 4, find an approximate value for the range.

Best Answer

For an independent sample of size $N$ from a continuous random variable $X$ with CDF, the CDF's for the maximum and minimum of the sample are $$ \eqalign{\mathbb P(X_{max} \le r) &= F(r)^N\cr \mathbb P(X_{min} \le r) &= 1 - (1-F(r))^N\cr}$$ In particular, if $N=100$, with probability $0.9$ $X_{max}$ is in the interval where $F(r)$ goes from $0.05^{1/100} \approx 0.970487$ to $0.95^{1/100} \approx 0.999487$, and with probability $0.9$ $X_{min}$ is in the interval where $F(r)$ goes from $1-0.95^{1/100} \approx 0.000513$ to $1-0.05^{1/100} \approx 0.029513$. $X_{max}$ and $X_{min}$ are not independent, but we can say that with probability at least $0.8$ both of these are true and the range is between $F^{-1}(0.970487) - F^{-1}(0.029513)$ and $F^{-1}(0.999487) - F^{-1}(0.000513)$. In the case of a normal distribution with mean $\mu$ and standard deviation $\sigma$, that corresponds to a range of between approximately $3.776 \sigma$ and $6.567 \sigma$.

I don't know of a closed-form result for the distribution of the range in the normal case, but in a simulation of $10000$ such samples, the median range was $4.9514 \sigma$, with $10$'th percentile $4.2748 \sigma$ and $90$'th percentile $5.8001 \sigma$.

So we can't really give a "good approximation" for the range, but a reasonable guess is about $5 \sigma$.

Of course the statement that the earnings are normally distributed is not literally true (unless negative earnings are possible and irrational earnings are almost certain). What might be true is that the earnings distribution is well approximated in some sense by a normal distribution. Usually such approximations may be pretty good near the middle of the distribution, but bad in the tails (nobody has negative earnings, and the CEO may be many standard deviations above the mean). Unfortunately, the range is very sensitive to the tails, so the normal approximation may not be very good in practice.