Solved – Calculating Range based on Mean, Standard Deviation and Varying Sample Size

rangestandard deviation

I have recently began studying statistics with my learning material being a book on "Basic Statistics for the Behavioral Sciences by Kenneth D. Hopkins and Gene V Glass (1978)", and so far I have understood concepts from the measures of central tendencies (i.e. the mean, median and mode) as well as the range to standard deviations. But when trying to tackle the exercises I have come across difficulties trying to understand the solutions.

One problem asks to estimate the separate ranges of three samples of 10, 100 and 1000 individuals involving height with a mean of 63.5 inches and a standard deviation of 2.5 inches. The distribution is normal. The answers were stated as follows using the equation ~ range = E*(standard deviation):

For n = 10,
range = 3.1(2.5) = 7.75

For n = 100,
range = 5(2.5) = 12.5

For n = 1000,
range = 6.5(2.5) = 16.25

The issue I have is that I do not understand how the expected value, E, was calculated or why this equation works as a method for estimating the range. I would be much obliged if someone could explain this to me.

Best Answer

The calculations are somewhat involved, but accurate tables date back to Tippett 1925 [1]. Tippett gives values for $n$ between $2$ and $1000$.

Judging from the (very) little that Google books would show me, in the 1978 edition of your book this information appears to be in Table 5.1 or thereabouts.

The expected range for a sample of size $n$ in a symmetric distribution with mean $0$ is twice the expected largest value in a sample of the same size.

The density of the largest value $X_{(n)}$ in a sample of size $n$ from a distribution with density $f$ and cdf $F$ is $n\,f(x)\,F(x)^{n-1}$. (See the Wikipedia article on order statistics.)

That expected largest value $X_{(n)}$ in a sample of size $n$ is therefore obtainable by integration. This expected value is

$$E(X_{(n)})=\int_{-\infty}^\infty\, n\,x\,f(x)\,F(x)^{n-1}\:dx\,.$$

For a standard normal I computed this numerically for a sample of size 10 in R:

 f <- function(x) 10*x*pnorm(x)^9*dnorm(x)
 integrate(f,-Inf,Inf)
 1.538753 with absolute error < 1.3e-06

Doubling this value (to obtain the expected range) we get 3.077506, which agrees with Tippett's 3.07751 to the number of places he gives (and with your value to the number of places you give; unsurprising since your values are Tippett's values rounded to two figures).

It's easy to simulate the distribution in anything that will generate normal random values and calculate a range. You might find doing so enlightening:

Histograms of range from normal samples of sizes 10, 100 and 1000 respectively, with tabulated means marked in

(I marked the means from Tippett's table with a thin blue line; it projects slightly below the histogram in each case so you can find it on the scale easily. You can see that the distributions are quite spread out around their expected values, meaning that the range in a normal sample may be quite some way from its expected number of standard deviations.)

In large samples from normal distributions the expected range increases roughly linearly in $\sqrt{\log(n)}$ (see the image at the end of this answer)


So that covers the case for a standard normal. However, for a normal with any other mean $\mu$ and standard deviation $\sigma$, the expected range, $E(X_{(n)}-X_{(1)}) $ $= E(X_{(n)})-E(X_{(1)}) $ $= \mu+\sigma E(Z_{(n)})-(\mu+\sigma E(X_{(1)})) $ $=\sigma (E(Z_{(n)})-E(Z_{(1)}))$, i.e. just the expected range for a standard normal times $\sigma$, the population standard deviation.

Note that all this only works if you know the population standard deviation. If you're computing an estimate of the expected range from the sample standard deviation you need to know about the behavior of the ratio of sample range to sample standard deviation.

[1]: L. H. C. Tippett (1925). "On the Extreme Individuals and the Range of Samples Taken from a Normal Population". Biometrika 17 (3/4): 364–387