Solved – Why does Andrew Ng prefer to use SVD and not EIG of covariance matrix to do PCA

eigenvalueslinear algebranumericspcasvd

I am studying PCA from Andrew Ng's Coursera course and other materials. In the Stanford NLP course cs224n's first assignment, and in the lecture video from Andrew Ng, they do singular value decomposition instead of eigenvector decomposition of covariance matrix, and Ng even says that SVD is numerically more stable than eigendecomposition.

From my understanding, for PCA we should do SVD of the data matrix of (m,n) size, not of the covariance matrix of (n,n) size. And eigenvector decomposition of covariance matrix.

Why do they do SVD of covariance matrix, not data matrix?

Best Answer

amoeba already gave a good answer in the comments, but if you want a formal argument, here it goes.

The singular value decomposition of a matrix $A$ is $A=U\Sigma V^T$, where the columns of $V$ are eigenvectors of $A^TA$ and the diagonal entries of $\Sigma$ are the square roots of its eigenvalues, i.e. $\sigma_{ii}=\sqrt{\lambda_i(A^TA)}$.

As you know, the principal components are the orthogonal projections of your variables onto the space of the eigenvectors of the empirical covariance matrix $\frac{1}{n-1}A^TA$. The variance of the components is given by its eigenvalues, $\lambda_i(\frac{1}{n-1}A^TA)$.

Consider any square matrix $B$, $\alpha \in \mathbb R$ and a vector $v$ such that $Bv=\lambda v$. Then

  1. $B^kv=\lambda^kv$
  2. $\lambda(\alpha B) = \alpha\lambda( B)$

Let us define $S=\frac{1}{n-1}A^TA$. The SVD of $S$ will compute the eigendecomposition of $S^TS=\frac{1}{(n-1)^2}A^TAA^TA$ to yield

  1. the eigenvectors of $(A^TA)^TA^TA=A^TAA^TA$, which by property 1 are those of $A^TA$
  2. the square roots of the eigenvalues of $\frac{1}{(n-1)^2}A^TAA^TA$, which by property 2, then 1, then 2 again, are $\sqrt{\frac{1}{(n-1)^2} \lambda_i(A^TAA^TA)} = \sqrt{\frac{1}{(n-1)^2} \lambda_i^2(A^TA)} = \frac{1}{n-1}\lambda_i(A^TA) = \lambda_i(\frac{1}{n-1}A^TA)$.

VoilĂ !

Regarding the numerical stability, one would need to figure out what the employed alogrithms are. If you're up to it, I believe these are the LAPACK routines used by numpy:

Update: On the stability, the SVD implementation seems to be using a divide-and-conquer approach, while the eigendecomposition uses a plain QR algorithm. I cannot access some relevant SIAM papers from my institution (blame research cutbacks) but I found something that might support the assessment that the SVD routine is more stable.

In

Nakatsukasa, Yuji, and Nicholas J. Higham. "Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD." SIAM Journal on Scientific Computing 35.3 (2013): A1325-A1349.

they compare the stability of various eigenvalue algorithms, and it seems that the divide-and-conquer approach (they use the same one as numpy in one of the experiments!) is more stable than the QR algorithm. This, together with claims elsewhere that D&C methods are indeed more stable, supports Ng's choice.