Solved – Independence of Sample mean and Sample range of Normal Distribution

distributionsmeannormal distributionorder-statisticsrange

Let $X_1,\dots,X_n$ be i.i.d. random variables with $X_1 \sim N(\mu,\sigma^2)$. Let $\bar X =\sum_{i=1}^n X_i/n$ and $R = X_{(n)}-X_{(1)}$, where $X_{(i)}$ is the $i$ the order statistic. Show that $\bar X$ and $R$ are independently distributed.

I know that sample mean and sample variance of normal distribution are independent. But this result state that sample mean and sample range of normal distribution are also independent. I know that $\bar X \sim N(\mu,\sigma^2/n)$.

Best Answer

This is evidently a self-study question, so I do not intend to deprive you of the satisfaction of developing your own answer. Moreover, I'm sure there are many possible solutions. But for guidance, consider these observations:

  1. When a random variable $X$ is independent of other random variables $Y_1, \ldots, Y_m$, then $X$ is independent of any function of them, $f(Y_1, \ldots, Y_m)$. (See Functions of Independent Random Variables for more about this.)

  2. Because the $X_i$ are jointly Normal, $X_1 + \cdots + X_n$ is independent of all the differences $Y_{ij} = X_i - X_j$, since their covariances are zero.

Because the range can be expressed as

$$X_{(n)} - X_{(1)} = \max_{i,j}(|X_i - X_j|) = \max_{i,j}(|Y_{ij}|)$$

you can exploit (1) and (2) to finish the proof.


For more intuition, a quick simulation might be of some help. The following shows the marginal and joint distribution of the mean and range in the case $n=3$, using $10,000$ independent datasets. The joint distribution clearly is not bivariate Normal, so the temptation to prove independence by means of a zero correlation--although a good idea--is bound to fail. However, a close analysis of these results ought to suggest that the conditional distribution of the range does not vary with the mean. (The appearance of some variation at the right and left is due to the paucity of outcomes with such extreme means.)

Figure

Here is the R code that produced these figures. It is easily modified to vary $n$, vary the simulation size, and to analyze the simulation results more extensively.

n <- 3; n.sim <- 1e4
sim <- apply(matrix(rnorm(n * n.sim), n), 2, function(y) c(mean(y), diff(range(y))))
par(mfrow=c(1,3))
hist(sim[1,], xlab="Mean", main="Histogram of Means")
hist(sim[2,], xlab="Range", main="Histogram of Ranges")
plot(sim[1,], sim[2,], pch=16, col="#00000020", xlab="Mean", ylab="Range")
Related Question