Why does singular value decomposition simultaneously diagonalize a symmetric matrix and its square

diagonalizationeigenvalues-eigenvectorsprincipal component analysissvdsymmetric matrices

So I took an online course on machine learning and in this course the instructor said that the eigenvectors of a covariance matrix (for principal components analysis) can be computed by a singular value decomposition.

Say the covariance matrix is $A$. The SVD yields $A = U \Sigma V^t$ and with $A^t A = A^2$ both $V$ and $U$ diagonalize $A^2$, i.e. $V^t A^2 V = U^t A^2 U = \Sigma^2$. What I do not understand is why U and V also diagonalize A directly, i.e. $V^t A V = U^t A U = \Sigma$.

Diagonalizable matrices that commute can be simultaneously diagonalized and so I understand that $A^2$ and $A$ can be simultaneously diagonalized. However, the dimension of each eigenspace can generally be $>1$ which means that a set of eigenvectors that diagonalizes $A^2$ does not have to diagonalize $A$ as well. So why does the SVD algorithm automatically find the eigenvectors of $A^2$ that are simultaneously eigenvectors $A$?

I am not a mathematician, please have mercy if the answer is kind of obvious.

Best Answer

Covariance matrices are positive-semidefinite, and PSD matrices have unique PSD square roots (given by taking the unique nonnegative square root of each eigenvalue). This means that $V^T A V$ is the unique PSD square root of $V^T A^2 V$. We have $V^T A^2 V = \Sigma^2$ and the unique PSD square root of $\Sigma^2$ is $\Sigma$, so $V^T A V = \Sigma$.

We can give an alternative analysis in terms of eigenspaces as follows. Consider some eigenspace $E_{\lambda}$ of $A^2$. By definition we have $A^2 v = \lambda v$ for all $v \in E_{\lambda}$. Since $A$ commutes with $A^2$, it restricts to a map $A : E_{\lambda} \to E_{\lambda}$ which squares to $\lambda$. You are correct that in general it does not follow that $A$ acts by a scalar (and good job spotting this possibility!), but if $\lambda \ge 0$ and $A$ is PSD then $A$ must act by $\sqrt{\lambda}$. This is because $\sqrt{\lambda}$ is the only possible eigenvalue of $A$ here (and $A$ is diagonalizable by the spectral theorem).

Related Question