Why does singular value decomposition simultaneously diagonalize a symmetric matrix and its square

diagonalizationeigenvalues-eigenvectorsprincipal component analysissvdsymmetric matrices

So I took an online course on machine learning and in this course the instructor said that the eigenvectors of a covariance matrix (for principal components analysis) can be computed by a singular value decomposition.

Say the covariance matrix is $A$. The SVD yields $A = U \Sigma V^t$ and with $A^t A = A^2$ both $V$ and $U$ diagonalize $A^2$, i.e. $V^t A^2 V = U^t A^2 U = \Sigma^2$. What I do not understand is why U and V also diagonalize A directly, i.e. $V^t A V = U^t A U = \Sigma$.

Diagonalizable matrices that commute can be simultaneously diagonalized and so I understand that $A^2$ and $A$ can be simultaneously diagonalized. However, the dimension of each eigenspace can generally be $>1$ which means that a set of eigenvectors that diagonalizes $A^2$ does not have to diagonalize $A$ as well. So why does the SVD algorithm automatically find the eigenvectors of $A^2$ that are simultaneously eigenvectors $A$?

I am not a mathematician, please have mercy if the answer is kind of obvious.

Best Answer

Covariance matrices are positive-semidefinite, and PSD matrices have unique PSD square roots (given by taking the unique nonnegative square root of each eigenvalue). This means that $V^T A V$ is the unique PSD square root of $V^T A^2 V$. We have $V^T A^2 V = \Sigma^2$ and the unique PSD square root of $\Sigma^2$ is $\Sigma$, so $V^T A V = \Sigma$.

We can give an alternative analysis in terms of eigenspaces as follows. Consider some eigenspace $E_{\lambda}$ of $A^2$. By definition we have $A^2 v = \lambda v$ for all $v \in E_{\lambda}$. Since $A$ commutes with $A^2$, it restricts to a map $A : E_{\lambda} \to E_{\lambda}$ which squares to $\lambda$. You are correct that in general it does not follow that $A$ acts by a scalar (and good job spotting this possibility!), but if $\lambda \ge 0$ and $A$ is PSD then $A$ must act by $\sqrt{\lambda}$. This is because $\sqrt{\lambda}$ is the only possible eigenvalue of $A$ here (and $A$ is diagonalizable by the spectral theorem).

Related Solutions

Linear Algebra – Visualization of Singular Value Decomposition of a Symmetric Matrix

Singular value decomposition

Start with a matrix with $m$ rows, $n$ columns, and rank $\rho$, $$ \mathbf{A}\in\mathbb{C}^{m\times n}_{\rho} $$ which has the singular value decomposition $$ \mathbf{A} = \mathbf{U} \, \Sigma \, \mathbf{V}^{*} = % \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} & \color{red} {\mathbf{U}_{\mathcal{N}\left(\mathbf{A}^{*}\right)}} \end{array} \right] % \left[ \begin{array}{cc} \mathbf{S} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % \left[ \begin{array}{cc} \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} & \color{red} {\mathbf{V}_{\mathcal{N}\left(\mathbf{A}\right)}} \end{array} \right]^{*} % $$ where the color denotes $\color{blue}{range}$ spaces and $\color{red}{null}$ spaces. The dimensions of the domain matrices are $$ % \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} \in \mathbb{C}^{m\times \rho}, \quad % \color{red}{\mathbf{U}_{\mathcal{N}\left(\mathbf{A}^{*}\right)}} \in \mathbb{C}^{m \times m - \rho}, \quad % \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} \in \mathbb{C}^{n\times \rho}, \quad % \color{red}{\mathbf{V}_{\mathcal{N}\left(\mathbf{A}\right)}} \in \mathbb{C}^{n\times n - \rho}. $$ The domain matrices are unitary: $$ \begin{align} \mathbf{U}\mathbf{U}^{*} &= \mathbf{U}^{*}\mathbf{U} = \mathbf{I}_{m} \\ \mathbf{V}\mathbf{V}^{*} &= \mathbf{V}^{*}\mathbf{V} = \mathbf{I}_{n} \end{align} $$

The dimensions of the singular value matrices are $$ % \Sigma \in \mathbb{R}^{m\times n}, \quad % \mathbf{S} \in \mathbb{R}^{\rho\times \rho}. $$

The hermitian conjugate is constructed according to $$ \mathbf{A}^{*} = \mathbf{V} \, \Sigma^{\mathrm{T}} \, \mathbf{U}^{*} = % \left[ \begin{array}{cc} \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} & \color{red} {\mathbf{V}_{\mathcal{N}\left(\mathbf{A}\right)}} \end{array} \right] % \left[ \begin{array}{cc} \mathbf{S} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} & \color{red} {\mathbf{U}_{\mathcal{N}\left(\mathbf{A}^{*}\right)}} \end{array} \right]^{*} % $$ where $\Sigma^{\mathrm{T}}\in \mathbb{R}^{n\times m}$.

The Moore-Penrose pseudoinverse is constructed according to $$ \mathbf{A}^{\dagger} = \mathbf{V} \, \Sigma^{\dagger} \, \mathbf{U}^{*} = % \left[ \begin{array}{cc} \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} & \color{red} {\mathbf{V}_{\mathcal{N}\left(\mathbf{A}\right)}} \end{array} \right] % \left[ \begin{array}{cc} \mathbf{S}^{-1} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} & \color{red} {\mathbf{U}_{\mathcal{N}\left(\mathbf{A}^{*}\right)}} \end{array} \right]^{*} % $$ where $\Sigma^{\dagger}\in \mathbb{R}^{n\times m}$.

The product matrix rules you stated always hold: $$ \begin{align} % \mathbf{A} \mathbf{A}^{*} &= % \left( \mathbf{U} \, \mathbf{\Sigma} \, \mathbf{V}^{*} \right) % \left( \mathbf{U} \, \mathbf{\Sigma} \, \mathbf{V}^{*} \right)^{*} % = % \left( \mathbf{U} \, \mathbf{\Sigma} \, \mathbf{V}^{*} \right) % \left( \mathbf{V} \, \mathbf{\Sigma}^{\mathrm{T}} \, \mathbf{V}^{*} \right) \\ % \mathbf{A}^{*} \mathbf{A} &= % \left( \mathbf{U} \, \mathbf{\Sigma} \, \mathbf{V}^{*} \right)^{*} % \left( \mathbf{U} \, \mathbf{\Sigma} \, \mathbf{V}^{*} \right) % = % \left( \mathbf{V} \, \mathbf{\Sigma}^{\mathrm{T}} \, \mathbf{V}^{*} \right) % \left( \mathbf{U} \, \mathbf{\Sigma} \, \mathbf{V}^{*} \right) \\ % \end{align} $$ Examples follow.

Square, full rank $m = n = \rho$

$$ \mathbf{A} = \mathbf{U} \, \Sigma \, \mathbf{V}^{*} = % \left[ \begin{array}{c} \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} \end{array} \right] % \left[ \mathbf{S} \right] % \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} \end{array} \right]^{*} % $$ The product matrices are $$ \begin{align} % \mathbf{A}^{*}\mathbf{A} &= \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} \, \mathbf{S}^{2} \, \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}}^{*} \\ % \mathbf{A}\mathbf{A}^{*} &= \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} \, \mathbf{S}^{2} \, \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}}^{*} % \end{align} $$

Tall, full column rank $n = \rho$, $m \ge n$

$$ \mathbf{A} = \mathbf{U} \, \Sigma \, \mathbf{V}^{*} = % \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} & \color{red} {\mathbf{U}_{\mathcal{N}\left(\mathbf{A}^{*}\right)}} \end{array} \right] % \left[ \begin{array}{c} \mathbf{S} \\ \mathbf{0} \end{array} \right] % \left[ \begin{array}{cc} \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} & \color{red} {\mathbf{V}_{\mathcal{N}\left(\mathbf{A}\right)}} \end{array} \right]^{*} % $$ The product matrices are $$ \begin{align} % \mathbf{A}^{*}\mathbf{A} &= \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} \, \mathbf{S}^{2} \, \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}}^{*} \\ % \mathbf{A}\mathbf{A}^{*} &= % \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} & \color{red} {\mathbf{U}_{\mathcal{N}\left(\mathbf{A}^{*}\right)}} \end{array} \right] % \left[ \begin{array}{cc} \mathbf{S}^{2} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} & \color{red} {\mathbf{U}_{\mathcal{N}\left(\mathbf{A}^{*}\right)}} \end{array} \right]^{*} % \end{align} $$

Wide, full row rank $m = \rho$, $n \ge m$

$$ \mathbf{A} = \mathbf{U} \, \Sigma \, \mathbf{V}^{*} = % \left[ \begin{array}{c} \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} \end{array} \right] % \left[ \begin{array}{cc} \mathbf{S} & \mathbf{0} \end{array} \right] % \left[ \begin{array}{cc} \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} & \color{red} {\mathbf{V}_{\mathcal{N}\left(\mathbf{A}\right)}} \end{array} \right]^{*} % $$ The product matrices are $$ \begin{align} % \mathbf{A}^{*}\mathbf{A} &= % \left[ \begin{array}{cc} \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} & \color{red} {\mathbf{V}_{\mathcal{N}\left(\mathbf{A}\right)}} \end{array} \right] % \left[ \begin{array}{cc} \mathbf{S}^{2} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % \left[ \begin{array}{cc} \color{blue}{\mathbf{V}_{\mathcal{R}\left(\mathbf{A}^{*}\right)}} & \color{red} {\mathbf{V}_{\mathcal{N}\left(\mathbf{A}\right)}} \end{array} \right]^{*} \\ % \mathbf{A}\mathbf{A}^{*} &= % \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} \, \, \mathbf{S}^{2} \, \color{blue}{\mathbf{U}_{\mathcal{R}\left(\mathbf{A}\right)}} \\ % \end{align} $$

For the hermitian matrix, $$ \begin{align} \mathbf{A} &= \mathbf{A}^{*} \\ \mathbf{U} \, \Sigma \, \mathbf{V}^{*} &= \mathbf{V} \, \Sigma \, \mathbf{U}^{*} \end{align} $$ because in this case $\Sigma = \Sigma^{\mathrm{T}}$.

Linear Algebra – Understanding the Singular Value Decomposition (SVD)

One geometric interpretation of the singular values of a matrix is the following. Suppose $A$ is an $m\times n$ matrix (real valued, for simplicity). Think of it as a linear transformation $\mathbb R^n \to \mathbb R^m$ in the usual way. Now take the unit sphere $S$ in $\mathbb R^n$. Being a linear transformation, $A$ maps $S$ to an ellipsoid in $\mathbb R^m$. The lengths of the semi-axes of this ellipsoid are precisely the non-zero singular values of $A$. The zero singular values tell us what the dimension of the ellipsoid is going to be: $n$ minus the number of zero singular values.

Best Answer

Related Solutions

Linear Algebra – Visualization of Singular Value Decomposition of a Symmetric Matrix

Linear Algebra – Understanding the Singular Value Decomposition (SVD)

Related Question