To prove the theorem we are going to need the following intermediate result.

**Theorem**. Let $\mathbf{y} \sim N_n \left(0, \sigma^2 \mathbf{I}_n \right)$ and let $ Q = \sigma^{-2} \mathbf{y}^{\prime} \mathbf{A} \mathbf{y}$ for a symmetric matrix $\mathbf{A}$ of rank $r$. Then if $\mathbf{A}$ is idempotent, Q has a $\chi^2 (r)$ distribution.

The theorem extends to the other direction as well but we only need the sufficiency so we will just prove this and we will do so using the eigenvalue-eigenvector decomposition of an idempotent matrix.

It is important to note that we first consider the case of mean zero. We will relax this assumption afterwards. But for now recall that for a square matrix of rank r, say $\mathbf{A}$

$$\mathbf{A} = \sum_{i=1}^{r} \lambda_i \mathbf{c}_i \mathbf{c}_i^{\prime}$$

where the lambdas are the eigenvalues and the $\mathbf{c}_i$s the corresponding eigenvectors. All pretty standard so far. Now if we additionally restrict $\mathbf{A}$ to be symmetric two things happen:

The eigenvalues are real-valued

Eigenvectors corresponding to different eigenvalues are *orthogonal*

These are consequences of the so-called Spectral Theorem of linear algebra and you can consult any good textbook for a proof. What does that do for us? You are about to see why symmetry is required. Let's write down the decomposition of our $\mathbf{A}$ in our quadratic form and see what happens.

$$\sigma^{-2} \mathbf{y}^{\prime} \mathbf{A} \mathbf{y} = \sigma^{-2} \mathbf{y}^{\prime} \left( \sum_{i=1}^r \lambda_i \mathbf{c}_i \mathbf{c}_i^{\prime} \right) \mathbf{y} = \sum_{i=1}^r \lambda_i \left( \sigma^{-1} \mathbf{c}_i^{\prime} \mathbf{y} \right)^2 \tag{1}$$

We have written our quadratic form as weighted squared projections onto orthogonal axes. Let's now investigate the distribution of $\mathbf{c}_i ^{\prime} \mathbf{y}$ and $ \mathbf{c}_j ^{\prime} \mathbf{y}$, $i \neq j$. By basic rules

$$\sigma^{-1} \mathbf{c}_i ^{\prime} \mathbf{y} \sim N(0, \underbrace{\mathbf{c}_i^{\prime} \mathbf{c}_i}_{=1} ) $$

as the eigenvectors are not unique and therefore can be rescaled without loss of generality to have length one. Next,

$$Cov\left(\sigma^{-1} \mathbf{c}_i ^{\prime} \mathbf{x} , \sigma^{-1} \mathbf{c}_j ^{\prime} \mathbf{x} \right) = \sigma^{-2}\mathbf{c}_i ^{\prime} I_n \mathbf{c}_j = 0, \ \ i \neq j $$

by the second implication of the Spectral Theorem. Thus our summands are uncorrelated. By the normality they are also independent. It is easy to see then that the sum consists of weighted $\chi^2$ random variables (weighted by the eigenvalues). In this thread it was asked whether this follows the $\chi^2$ distribution regardless. The answer is no of course.

Enter the idempotence. It easily follows from the definition of an idempotent matrix and the eigenvalue/eigenvector problem that an idempotent matrix has eigenvalues equal to either one or zero. Since by assumption the matrix $\mathbf{A}$ has rank $r$, there are $r$ eigenvalues equal to one. Therefore

$$\sigma^{-2} \mathbf{y}^{\prime} \mathbf{A} \mathbf{y} = \sum_{i=1}^r \left( \sigma^{-1} \mathbf{c}_i^{\prime} \mathbf{y} \right)^2 \sim \chi^2 (r) $$

And this completes the proof.

What happens now if the vector $y$ has nonzero mean? No harm done, we will just use the definition of the non-central $\chi^2$ distribution to conclude that if

$$\mathbf{y} \sim N_n \left( \boldsymbol{\mu}, \sigma^2 \mathbf{I}_n \right)$$

then

$$\sigma^{-2} \mathbf{y}^{\prime} \mathbf{A} \mathbf{y} ~ \sim \chi^2 \left(r, \boldsymbol{\mu}^{\prime} \mathbf{A} \boldsymbol{\mu} \right) $$

where the second term indicates the non-centrality parameter. (Due to force of habit, I will skip the division by $2$ but you can just do it your way).

We are now ready to prove the required result.

**Theorem**. Suppose $\Sigma$ is a $n \times n$ positive definite and symmetric matrix, $A$ is a $n \times n$ symmetric matrix with rank $r$, and $(A\Sigma)^2 = A\Sigma$.

Then $\mathbf{y} \sim N(\boldsymbol{\mu}, \Sigma) \implies Q = \mathbf{y}^{\prime}A\mathbf{y} \sim \chi^2 \left(r, \boldsymbol{\mu}^{\prime} \mathbf{A}\boldsymbol{\mu} \right)\text{.}$

We are given that

$$\mathbf{A\Sigma A \Sigma} = \mathbf{A\Sigma}$$

from which it follows that

$$\mathbf{A\Sigma A} = \mathbf{A}$$

and hence we may rewrite our quadratic form as

$$Q = \left( \boldsymbol{\Sigma}^{-1/2} \mathbf{y} \right)^{\prime} \boldsymbol{\Sigma}^{1/2} \mathbf{A} \boldsymbol{\Sigma}^{1/2} \left( \boldsymbol{\Sigma}^{-1/2} \mathbf{y} \right) \tag{2} $$

Since by assumption $\boldsymbol{\Sigma}$ is a positive definite matrix, its square root is always well-defined and you can check that using the eigenvalue/eigenvector decomposition (You can also check that equation ($2$) is equivalent to equation ($1$) !)

Of course you would agree that $\boldsymbol{\Sigma}^{-1/2} \mathbf{y} \sim N_n \left( \boldsymbol{\Sigma}^{-1/2} \boldsymbol{\mu} , \mathbf{I}_n \right)$. If we temporarily assume that $\boldsymbol{\mu}=0$ we would be in the situation of the first theorem, right? Well, not exactly - we still have to show that the middle matrix is idempotent but that is easy enough under our assumptions.

$$\boldsymbol{\Sigma}^{1/2} \mathbf{A} \boldsymbol{\Sigma}^{1/2} \boldsymbol{\Sigma}^{1/2} \mathbf{A} \boldsymbol{\Sigma}^{1/2} = \boldsymbol{\Sigma}^{1/2} \mathbf{A} \boldsymbol{\Sigma}^{1/2}$$

Therefore if $\mathbf{y} \sim N_n \left(\mathbf{0}, \boldsymbol{\Sigma} \right)$, $Q \sim \chi^2 \left(r \right)$ which implies that if we now switch back to the situation of nonzero mean, we would have

$$ Q = \mathbf{y}^{\prime}\mathbf{A}\mathbf{y} \sim \chi^2 \left(r, \boldsymbol{\mu}^{\prime} \mathbf{A}\boldsymbol{\mu} \right)$$

as required. $\square$

## Best Answer

You can make this work when you define the multivariate t-distribution as

$$\begin{array}{c} \mathbf{X} = \frac{\mathbf{Y}+\boldsymbol{\mu}}{\sqrt{Z/\nu}}\\ \\ \text{where} \qquad \textbf{Y} \sim N(0,\boldsymbol{\Sigma}) \qquad \text{and} \qquad Z \sim \chi^2_\nu \end{array}$$

Note that this differs from

$$\mathbf{X} = \frac{\mathbf{Y}}{\sqrt{Z/\nu}} +\boldsymbol{\mu}\\$$

which is, I believe, the more standard from of the multivariate t-disribution.

The distribution of the product of only the numerator is $\chi^2$ disributed $$(\mathbf{Y}+\boldsymbol{\mu})^\prime \boldsymbol{\Sigma}^{-1} (\mathbf{Y}+\boldsymbol{\mu}) \sim \chi^2_{p}(\text{ncp} = \boldsymbol{\mu}^\prime \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu})$$

The distribution of product with the entire fraction $\mathbf{X}$ is distributed as

$$\begin{array}{c}\mathbf{X}^\prime \boldsymbol{\Sigma}^{-1} \mathbf{X}/p = \frac{(\mathbf{Y}+\boldsymbol{\mu})^\prime \boldsymbol{\Sigma}^{-1} (\mathbf{Y}+\boldsymbol{\mu})/p}{Z/\nu} \sim \frac{Z_1/p}{Z_2/\nu}\\ \\ \text{where} \qquad Z_1 \sim \chi^2_{p}(\text{ncp} = \boldsymbol{\mu}^\prime \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}) \qquad \text{and} \qquad Z_2 \sim \chi^2_\nu \end{array}$$

This ratio of two $\chi^2$ distributed variables, $Z_1$ and $Z_2$ (where $Z_1$ may be a non-central $\chi^2$-distribution), divided by their degrees of freedom, $\frac{Z_1/p}{Z_2/\nu}$, is equivalent to a F-distributed variable.