Mathematical Statistics – Distribution of a Quadratic Form, Non-Central Chi-Squared Distribution: Detailed Insights

chi-squared-distributionmathematical-statisticsnon-centralquadratic formself-study

Definition. Suppose $\mathbf{y} \sim \mathcal{N}(\boldsymbol{\mu}, I_{n \times
n})$.

Then $$w = \mathbf{y}^{T}\mathbf{y} = \|\mathbf{y}\|^2 \sim
\chi^{2}_{n}\left(\theta = \|\boldsymbol{\mu}\|^2/2 =\boldsymbol{\mu}^{T}\boldsymbol{\mu}/2 \right)\text{,}$$
i.e., the $\chi^2$ distribution with $n$ degrees of freedom and
noncentrality parameter $\theta$.

I am trying to (at the very least) find a proof of the following theorem:

Theorem. Suppose $\Sigma$ is a $n \times n$ positive definite and symmetric matrix, $A$ is a $n \times n$ symmetric matrix with rank $m$, and $(A\Sigma)^2 = A\Sigma$.

Then $\mathbf{y} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) \implies \mathbf{y}^{T}A\mathbf{y} \sim \chi^2_m\left(\boldsymbol{\mu}^{T}A\boldsymbol{\mu}/2\right)\text{.}$

I don't see how this definition works. In my eyes, to even make the definition I gave above work with the theorem, we would have to "split" $A$ into powers of $1/2$ (whatever this means), i.e., look at $(A^{1/2}\mathbf{y})^TA^{1/2}\mathbf{y}$, unless I'm really missing something here.

This is just me doing self-reading, so any sources or anyone that can help me with a proof of this would be appreciated.

Best Answer

To prove the theorem we are going to need the following intermediate result.

Theorem. Let $\mathbf{y} \sim N_n \left(0, \sigma^2 \mathbf{I}_n \right)$ and let $ Q = \sigma^{-2} \mathbf{y}^{\prime} \mathbf{A} \mathbf{y}$ for a symmetric matrix $\mathbf{A}$ of rank $r$. Then if $\mathbf{A}$ is idempotent, Q has a $\chi^2 (r)$ distribution.

The theorem extends to the other direction as well but we only need the sufficiency so we will just prove this and we will do so using the eigenvalue-eigenvector decomposition of an idempotent matrix.

It is important to note that we first consider the case of mean zero. We will relax this assumption afterwards. But for now recall that for a square matrix of rank r, say $\mathbf{A}$

$$\mathbf{A} = \sum_{i=1}^{r} \lambda_i \mathbf{c}_i \mathbf{c}_i^{\prime}$$

where the lambdas are the eigenvalues and the $\mathbf{c}_i$s the corresponding eigenvectors. All pretty standard so far. Now if we additionally restrict $\mathbf{A}$ to be symmetric two things happen:

The eigenvalues are real-valued
Eigenvectors corresponding to different eigenvalues are orthogonal

These are consequences of the so-called Spectral Theorem of linear algebra and you can consult any good textbook for a proof. What does that do for us? You are about to see why symmetry is required. Let's write down the decomposition of our $\mathbf{A}$ in our quadratic form and see what happens.

$$\sigma^{-2} \mathbf{y}^{\prime} \mathbf{A} \mathbf{y} = \sigma^{-2} \mathbf{y}^{\prime} \left( \sum_{i=1}^r \lambda_i \mathbf{c}_i \mathbf{c}_i^{\prime} \right) \mathbf{y} = \sum_{i=1}^r \lambda_i \left( \sigma^{-1} \mathbf{c}_i^{\prime} \mathbf{y} \right)^2 \tag{1}$$

We have written our quadratic form as weighted squared projections onto orthogonal axes. Let's now investigate the distribution of $\mathbf{c}_i ^{\prime} \mathbf{y}$ and $ \mathbf{c}_j ^{\prime} \mathbf{y}$, $i \neq j$. By basic rules

$$\sigma^{-1} \mathbf{c}_i ^{\prime} \mathbf{y} \sim N(0, \underbrace{\mathbf{c}_i^{\prime} \mathbf{c}_i}_{=1} ) $$

as the eigenvectors are not unique and therefore can be rescaled without loss of generality to have length one. Next,

$$Cov\left(\sigma^{-1} \mathbf{c}_i ^{\prime} \mathbf{x} , \sigma^{-1} \mathbf{c}_j ^{\prime} \mathbf{x} \right) = \sigma^{-2}\mathbf{c}_i ^{\prime} I_n \mathbf{c}_j = 0, \ \ i \neq j $$

by the second implication of the Spectral Theorem. Thus our summands are uncorrelated. By the normality they are also independent. It is easy to see then that the sum consists of weighted $\chi^2$ random variables (weighted by the eigenvalues). In this thread it was asked whether this follows the $\chi^2$ distribution regardless. The answer is no of course.

Enter the idempotence. It easily follows from the definition of an idempotent matrix and the eigenvalue/eigenvector problem that an idempotent matrix has eigenvalues equal to either one or zero. Since by assumption the matrix $\mathbf{A}$ has rank $r$, there are $r$ eigenvalues equal to one. Therefore

$$\sigma^{-2} \mathbf{y}^{\prime} \mathbf{A} \mathbf{y} = \sum_{i=1}^r \left( \sigma^{-1} \mathbf{c}_i^{\prime} \mathbf{y} \right)^2 \sim \chi^2 (r) $$

And this completes the proof.

What happens now if the vector $y$ has nonzero mean? No harm done, we will just use the definition of the non-central $\chi^2$ distribution to conclude that if

$$\mathbf{y} \sim N_n \left( \boldsymbol{\mu}, \sigma^2 \mathbf{I}_n \right)$$

then

$$\sigma^{-2} \mathbf{y}^{\prime} \mathbf{A} \mathbf{y} ~ \sim \chi^2 \left(r, \boldsymbol{\mu}^{\prime} \mathbf{A} \boldsymbol{\mu} \right) $$

where the second term indicates the non-centrality parameter. (Due to force of habit, I will skip the division by $2$ but you can just do it your way).

We are now ready to prove the required result.

Theorem. Suppose $\Sigma$ is a $n \times n$ positive definite and symmetric matrix, $A$ is a $n \times n$ symmetric matrix with rank $r$, and $(A\Sigma)^2 = A\Sigma$.

Then $\mathbf{y} \sim N(\boldsymbol{\mu}, \Sigma) \implies Q = \mathbf{y}^{\prime}A\mathbf{y} \sim \chi^2 \left(r, \boldsymbol{\mu}^{\prime} \mathbf{A}\boldsymbol{\mu} \right)\text{.}$

We are given that

$$\mathbf{A\Sigma A \Sigma} = \mathbf{A\Sigma}$$

from which it follows that

$$\mathbf{A\Sigma A} = \mathbf{A}$$

and hence we may rewrite our quadratic form as

$$Q = \left( \boldsymbol{\Sigma}^{-1/2} \mathbf{y} \right)^{\prime} \boldsymbol{\Sigma}^{1/2} \mathbf{A} \boldsymbol{\Sigma}^{1/2} \left( \boldsymbol{\Sigma}^{-1/2} \mathbf{y} \right) \tag{2} $$

Since by assumption $\boldsymbol{\Sigma}$ is a positive definite matrix, its square root is always well-defined and you can check that using the eigenvalue/eigenvector decomposition (You can also check that equation ($2$) is equivalent to equation ($1$) !)

Of course you would agree that $\boldsymbol{\Sigma}^{-1/2} \mathbf{y} \sim N_n \left( \boldsymbol{\Sigma}^{-1/2} \boldsymbol{\mu} , \mathbf{I}_n \right)$. If we temporarily assume that $\boldsymbol{\mu}=0$ we would be in the situation of the first theorem, right? Well, not exactly - we still have to show that the middle matrix is idempotent but that is easy enough under our assumptions.

$$\boldsymbol{\Sigma}^{1/2} \mathbf{A} \boldsymbol{\Sigma}^{1/2} \boldsymbol{\Sigma}^{1/2} \mathbf{A} \boldsymbol{\Sigma}^{1/2} = \boldsymbol{\Sigma}^{1/2} \mathbf{A} \boldsymbol{\Sigma}^{1/2}$$

Therefore if $\mathbf{y} \sim N_n \left(\mathbf{0}, \boldsymbol{\Sigma} \right)$, $Q \sim \chi^2 \left(r \right)$ which implies that if we now switch back to the situation of nonzero mean, we would have

$$ Q = \mathbf{y}^{\prime}\mathbf{A}\mathbf{y} \sim \chi^2 \left(r, \boldsymbol{\mu}^{\prime} \mathbf{A}\boldsymbol{\mu} \right)$$

as required. $\square$

Best Answer

Related Solutions

Solved – Independence of a linear and a quadratic form

Self-Study – Deriving the Ridge Regression $\boldsymbol{\beta}\mid \mathbf{y}$ Distribution

Related Question