Positive definiteness of sample covariance matrix when $ N < p $

covariancecovariance-matrix

Say I have a sample where $ N < p $, where $ N $ denotes the number of observations and $ p $ the number of variables. I know that the rank of the covariance matrix is then at most $ N $. I know that it is not positive definite since it is not invertible. However, I read from Wikipedia that a covariance matrix has to be at least positive semidefinite. Can I deduce positive semidefiniteness from here?

Best Answer

Yes, the sample covariance matrix will still be positive semi-definite.

To see this, note that if $X\in \mathbb{R}^{N\times p}$ is the data matrix (with observations in the rows and variables in the columns), then the sample covariance matrix is $C := \frac{1}{N-1}Y^TY\in \mathbb{R}^{p\times p}$, where $Y\in \mathbb{R}^{N\times p}$ is the matrix $X$ with each column's mean subtracted from that column's entries.

Note that $C$ is symmetric as $C^T = \left( \frac{1}{N-1}Y^TY\right)^T = \frac{1}{N-1}Y^TY=C$.

Also, for any $\mathbf{v}\in\mathbb{R}^{p}$, we have

$$\begin{align*} \mathbf{v}^T C \mathbf{v}&= \frac{1}{N-1}\mathbf{v}^TY^TY\mathbf{v}\\ &= \frac{1}{N-1} \left( Y\mathbf{v}\right)^T Y\mathbf{v}\\ &= \frac{1}{N-1}\left\| Y\mathbf{v}\right\|^{2}\\ &\ge 0. \end{align*} $$

Thus the sample covariance matrix $C$ is positive semi-definite.