[Math] Best approach for numerically computing the pseudo-inverse of a covariance matrix

numerical linear algebrapseudoinverse

What are the reasons to prefer eigenvalue decomposition over singular value decomposition for numerically computing the pseudo-inverse of a symmetric real matrix? In the case when you want to form the pseudo-inverse a covariance matrix and you also have access to the original data matrix, does using SVD make more sense?

Using SciPy as an example, the documentation for scipy.linalg.pinvh states

Compute the (Moore-Penrose) pseudo-inverse of a Hermitian matrix.

Calculate a generalized inverse of a Hermitian or real symmetric matrix
using its eigenvalue decomposition and including all eigenvalues with
'large' absolute value.

Why do we prefer eigenvalue decomposition in this case?

Suppose instead of computing the pseudo-inverse of an arbitrary matrix, you start instead with an $m \times n$ matrix $X$ and want to compute the pseudo-inverse of its covariance matrix $C = XX^*$. It seems like there are two obvious and closely related approaches.

Since $C$ is real and symmetric, it can be orthogonally diagonalized: $C = Q \Lambda Q^T$, where $Q$ contains eigenvectors, $\Lambda$ contains eigenvalues, and $QQ^T = I$. We can get the pseudo-inverse of $C$ from the pseudo-inverse of $\Lambda$ as $C^+ = Q \Lambda^+ Q^T$.

Alternatively, start from the singular value decomposition of $X$ as $X = U \Sigma V^*$. Then $C = U \Sigma \Sigma^* U^*$, and we can get the pseudo-inverse again from the pseudo-inverse of $\Sigma \Sigma^*$.

I thought that computing singular values was always a well-conditioned problem, and that this would mean SVD would be a better approach for computing the pseudo-inverse of a covariance matrix. Is that correct? What other concerns might I be overlooking?

Best Answer

I wouldn't say we "prefer" the eigenvalues to the singular values in the symmetric/Hermitian case. They are effectively the same thing! Or, more accurately, the singular values are nothing more than the absolute values of the eigenvalues.

To see why, let $X$ be any Hermitian matrix. Its Schur (eigenvalue) decomposition is $X=Q\Lambda Q^H$ where $\Lambda$ is diagonal, and $Q^HQ=QQ^H=I$. Its singular value decomposition, on the other hand, is $X=U\Sigma V^H$, where $\Sigma$ is diagonal and nonnegative, and $U^HU=UU^H=I$ and $V^HV=VV^H=I$.

Alternatively, we can write these decompositions in dyadic form: $$X=\sum_{i=1}^n \lambda_i q_i q_i^H = \sum_{i=1}^n \sigma_i u_i v_i^H$$ where $q_i$, $u_i$, and $v_i$ are the $i$th columns of $Q$, $U$, and $V$, respectively, and $\lambda_i$ and $\sigma_i$ are the $i$th diagonal elements of $\Lambda$ and $\Sigma$, respectively.

In this latter form, it is simple to see that, given the Schur decomposition, $$\sigma_i = |\lambda_i|, \quad u_i = q_i, \quad v_i = \begin{cases} v_i & \lambda_i \geq 0 \\ -v_i & \lambda_i < 0 \end{cases} \qquad i=1,2,\dots, n$$ gives us a valid singular value decomposition. Our choice to apply the sign flipping to $v_i$ is arbitrary.

Be assured that in practice, this is precisely what you should be doing to compute the singular value decomposition of a Hermitian matrix. The symmetric eigenvalue problem is well-conditioned, and there is no numerical advantage to applying the a nonsymmetric SVD to a Hermitian matrix. In fact, the computation of the singular values of a non-Hermitian matrix $A$ are sometimes computed by applying a symmetric eigenvalue algorithm to the Hermitian matrix $$\begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix}$$ I'll leave it as an exercise for you to prove that this works :-)

EDIT: DGrady's comment alerted me to the fact that I didn't read the second half of the question correctly. In it, he was comparing the merits of computing the SVD of $X$ versus the eigenvalue decomposition of $C=XX^H$. In this case, it is indeed going to be a bit better to compute the SVD of $X$ directly.