Solved – Clear description of PCA using SVD of covariance matrix

dimensionality reductionpcasvd

After reading thousands of articles on PCA and SVD, using them in a number of programming frameworks and even implementing similar techniques (like Random Indexing) I found out that I still have doubts about some parts of PCA for dimension reduction. So let me show what I know and what I have doubts about.

Let's say we have $N$ observations of $M$-dimensional data, organized as matrix $A \in R^{N * M}$. To perform PCA we should first compute MLE estimate of covariance matrix $\Sigma \in R^{M*M}$:

$$\Sigma=\frac{1}{N}\sum_{i=1}^N(x_i – \bar x)(x_i – \bar x)^T$$

where $x_i \in R^M$ is $i$-th observation, $\bar x = \frac{1}{N}\sum_{k=1}^{N}x_k \in R^M$ is mean observation.

Then we can decompose $\Sigma$ using SVD as follows:

$$\Sigma = USV^T$$

Now here are several things I'm not sure about:

  1. What are dimensions of $U$, $S$ and $V^T$?
  2. In $USV^T$ what exactly is considered as eigenvalues and which of them should I use as principal components?
  3. How can I project original observations $x_i$ onto new reduced space and vice versa?

UPD. There's a different way to compute PCA using SVD – by factorizing data matrix ($A$ here) instead of covariance matrix ($\Sigma = AA^T$). Good description for this process may be found in this answer.

Best Answer

  • What are dimensions of $U$, $S$ and $V^T$?

Since $\Sigma$ is a M by M matrix, the three matrices $U$, $S$, $V^T$ wil be all M by M matrices. Because applying SVD on a N by M matrix, you will get $U_{N{\times}N}$, $S_{N{\times}M}$, and $V^T_{M{\times}M}$. You can verify that in matlab. When you truncate the singular values $S$ you also should remove the corresponding parts in $U$ and $V^T$.

  • In $USV^T$ what exactly is considered as eigenvalues and which of them should I use as principal components?

PCA should be done by doing eigenvalue decomposition on the covariance matrix $\Sigma$, or done by applying SVD on $A$. The left singular vectors of $SVD(A)$ come from the eigen vectors of $AA^T$, and the right singular vectors of $SVD(A)$ are from the eigenvectors of $A^TA$. But you need to order them according to the eigenvalues from large to small, and make them orthonormal. $A^TA$ is called Gram Matrix and is related to the covariance matrix $\Sigma$. If the dimensional vectors in $A$ (M of them totally) are all centered already, Gram Matrix = N * Covariance matrix. Check Wikipedia and some tutorials of SVD and PCA.

  • How can I project original observations $x_i$ onto new reduced space and vice versa?

If applying SVD on $A$ for PCA, it would be $u_i*S$; if applying eigen decomposition on covariance matrix $\Sigma$, and $V$ is eigenvectors of $\Sigma$, it is $x_i*V$.

Related Question