Dimensionality in PCA – Is PCA Still Done via Eigendecomposition of Covariance Matrix with High Dimensionality?

pca

I have a $20\times100$ matrix $X$, containing my $N=20$ samples in the $D=100$-dimensional space. I now wish to code up my own principal component analysis (PCA) in Matlab. I demean $X$ to $X_0$ first.

I read from someone's code that in such scenarios where we have more dimensions than observations, we no longer eigen-decompose the $X_0$'s covariance matrix. Instead, we eigen-decompose $\frac{1}{N-1}X_0X_0^T$. Why is it correct?

The normal covariance matrix is of size $D\times D$, each element of which tells us the covariance between two dimensions. To me, $\frac{1}{N-1}X_0X_0^T$ is not even of the correct dimensions! It is $N\times N$ matrix, so what would it tell us? Covariance between two observations?!

Best Answer

The covariance matrix is of $D\times D$ size and is given by $$\mathbf C = \frac{1}{N-1}\mathbf X_0^\top \mathbf X^\phantom\top_0.$$

The matrix you are talking about is of course not a covariance matrix; it is called Gram matrix and is of $N\times N$ size: $$\mathbf G = \frac{1}{N-1}\mathbf X^\phantom\top_0 \mathbf X_0^\top.$$

Principal component analysis (PCA) can be implemented via eigendecomposition of either of these matrices. These are just two different ways to compute the same thing.

The easiest and the most useful way to see this is to use the singular value decomposition of the data matrix $\mathbf X = \mathbf {USV}^\top$. Plugging this into the expressions for $\mathbf C$ and $\mathbf G$, we get: \begin{align}\mathbf C&=\mathbf V\frac{\mathbf S^2}{N-1}\mathbf V^\top\\\mathbf G&=\mathbf U\frac{\mathbf S^2}{N-1}\mathbf U^\top.\end{align}

Eigenvectors $\mathbf V$ of the covariance matrix are principal directions. Projections of the data on these eigenvectors are principal components; these projections are given by $\mathbf {US}$. Principal components scaled to unit length are given by $\mathbf U$. As you see, eigenvectors of the Gram matrix are exactly these scaled principal components. And the eigenvalues of $\mathbf C$ and $\mathbf G$ coincide.

The reason why you might see it recommended to use Gram matrix if $N<D$ is because it will be of smaller size, as compared to the covariance matrix, and hence be faster to compute and faster to eigendecompose. In fact, if your dimensionality $D$ is too high, there is no way you can even store the covariance matrix in memory, so operating on a Gram matrix is the only way to do PCA. But for manageable $D$ you can still use eigendecomposition of the covariance matrix if you prefer even if $N<D$.


Related Question