Distributions – Proving $\sum_{i=1}^n(X_i-\overline X_n)^2-\sum_{i=1}^m(X_i-\overline X_m)^2 \sim \chi^2_{n-m}$

chi-squared-distributiondistributionsnormal distributionquadratic formself-study

Suppose $X_1,X_2,\ldots,X_n$ are i.i.d $N(0,1)$ random variables. For $2\le m<n$, let $S_m^2=\sum_{i=1}^m(X_i-\overline X_m)^2$ and $S_n^2=\sum_{i=1}^n(X_i-\overline X_n)^2$ where $\overline X_m=\frac1m\sum_{i=1}^m X_i$ and $\overline X_n=\frac1n\sum_{i=1}^n X_i$. I am trying to prove that $T=S_n^2-S_m^2 \sim \chi^2_{n-m}$.

My idea is to write $T$ as a quadratic form $X^TAX$ where $X=(X_1,\ldots,X_n)^T$ and $A$ is a symmetric matrix of order $n$. Then $T$ would have a $\chi^2$ distribution if and only if $A$ is idempotent, the degrees of freedom of $T$ being the rank of $A$ (or the trace of $A$ since $A$ is idempotent).

Now $S_n^2=X^TA_1X$ where $A_1=I_n-\frac1n \mathbf1_n\mathbf1_n^T$ and $\mathbf1_n$ is a vector of all ones.

If $Y=(X_1,\ldots,X_m)^T$, then similarly, $S_m^2=Y^TA_2Y$ with $A_2=I_m-\frac1m \mathbf1_m\mathbf1_m^T$.

So I think $$T=X^TA_1X-Y^TA_2Y=X^TAX\,,$$

where $$A=A_1-\begin{pmatrix}A_2 & O_{m\times \overline{n-m}} \\ O_{\overline{n-m}\times m} & O_{n-m}\end{pmatrix}$$

I can show that $A_1$ and $A_2$ are idempotent, but verifying $A$ is idempotent is somewhat cumbersome. Is there any easier way out? Alternatively, is $S_n^2-S_m^2$ independent of $S_m^2$? I understand this would solve the problem since $S_n^2 \sim \chi^2_{n-1}$ and $S_m^2 \sim \chi^2_{m-1}$. Again, according to a theorem on quadratic forms, I just need to show $T$ is non-negative definite. Then from $S_n^2 \sim \chi^2_{n-1}$ and $S_m^2 \sim \chi^2_{m-1}$, it would follow that $T\sim \chi^2_{(n-1)-(m-1)}$. Any suggestions are welcome.

Best Answer

This is geometry.

There's not much to prove, actually, because you already know a lot.

  1. From $X_1, \ldots, X_m$ there exist $m-1$ orthonormal linear combinations $U_1, \ldots, U_{m-1}$ that have iid standard Normal distributions independent of $\bar X_m,$ for which $S_m^2 = U_1^2 + U_2^2 + \cdots + U_{m-1}^2.$ (This is the standard variance decomposition associated with the mean.)

  2. $X_{m+1}, \ldots, X_n$ are independent of $X_1,\ldots X_m$ and therefore $(\bar X_m, U_1, U_2, \ldots, U_{m-1}, X_{m+1}, X_{m+1}, \ldots, X_n)$ are independent.

  3. Because $n\bar X_n = m\bar X_m + (X_{m+1} + \cdots + X_n),$ $\bar X_n$ is independent of $U_1, \ldots, U_{m-1}.$

  4. Independence among Normal variables is equivalent to orthogonal linear combinations. Linear algebra tells us the $m-1$ linear combinations corresponding to $U_1, \ldots, U_{m-1}$ can be extended to an orthonormal basis $U_1, \ldots, U_{m-1}, U_m, \ldots, U_{n-1}$ of the space orthogonal to $\bar X_n.$ (This is a standard, important theorem. If you haven't seen it, prove it by induction--it's simple.)

  5. Similarly (exactly as in $(1)$), there exist orthonormal $V_1, \ldots, V_{n-1}$ that are independent of $\bar X_n$ and for which $S_n^2 = V_1^2 + \cdots + V_{n-1}^2.$

  6. Since $(U_1, \ldots, U_{n-1})$ and $(V_1, \ldots, V_{n-1})$ are both orthonormal bases for the space orthogonal to $\bar X_n,$ their sums of squares are equal: $S_n^2 = U_1^2 + \cdots + U_{n-1}^2.$

  7. Subtracting, we find $S_n^2 - S_m^2 = U_{m}^2 + U_{m+1}^2 + \cdots + U_{n-1}^2$ is the sum of $n-m$ orthogonal standard Normal variables, whence (by definition) it has a $\chi^2(n-m)$ distribution, QED.

Related Question