Solved – Using shrinkage when estimating covariance matrix before doing PCA

covariance-matrixpcaregularization

Although it is often calculated differently, my intuitive understanding of PCA arises from its definition as the eigendecomposition of the sample covariance matrix. I have recently become aware of various popular methods for improving estimation of the covariance matrix (e.g. Donoho et al. 2013 Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model). Indeed, the Wikipedia page for Estimation of Covariance Matrix says:

Estimates of covariance matrices are required at the initial stages of
principal component analysis and factor analysis, and are also
involved in versions of regression analysis that treat the dependent
variables in a data-set, jointly with the independent variable as the
outcome of a random sample.

I frequently find myself performing PCA of data matrices where the number of samples (n) is comparable or less than the number of variables (p), which is precisely the case where the estimation of sample covariance matrix can be improved using shrinkage or other techniques. My goal in these situations is dimensionality reduction so as to find patterns in the data (e.g. in the famous Iris dataset, PCA reveals three differnt kinds of flowers). If I were to improve the estimation of the covariance matrix, would PCA then give a "better" understanding of the structure in the data (e.g. the groups would separate nicer)? Are there any examples where shrinkage is very useful in dimensionality reduction using PCA?

Best Answer

The paper you cited (Donoho et al. 2013 Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model) is an impressive piece of work which I confess I did not really study. Nevertheless, I believe that it is easy to see that an answer to your question is negative: using any kind of shrinkage estimator of the covariance matrix will not improve your PCA results and, specifically, will not lead to "better understanding of the structure in the data".

In a nutshell, this is because shrinkage estimators only affect the eigenvalues of the sample covariance matrix and not the eigenvectors.

Let me quote the beginning of the abstract of Donoho et al.:

Since the seminal work of Stein (1956) it has been understood that the empirical covariance matrix can be improved by shrinkage of the empirical eigenvalues. In this paper, we consider a proportional-growth asymptotic framework with $n$ observations and $p_n$ variables having limit $p_n/n \to \gamma \in (0,1]$. We assume the population covariance matrix $\Sigma$ follows the popular spiked covariance model, in which several eigenvalues are significantly larger than all the others, which all equal $1$. Factoring the empirical covariance matrix $S$ as $S = V \Lambda V'$ with $V$ orthogonal and $\Lambda$ diagonal, we consider shrinkers of the form $\hat{\Sigma} = \eta(S) = V \eta(\Lambda) V'$ where $\eta(\Lambda)_{ii} = \eta(\Lambda_{ii})$ is a scalar nonlinearity that operates individually on the diagonal entries of $\Lambda$.

The abstract goes on to describe paper's contributions, but what is important for us here is that the sample covariance matrix $S$ and its shrinked version $\hat\Sigma$ have the same eigenvectors. Principal components are given by projections of the data onto these eigenvectors; so they will not be affected by the shrinkage.

The only thing that can get affected are the estimates of how much variance is explained by each PC because these are given by the eigenvalues. (And as @Aksakal wrote in the comments, this can affect the number of retained PCs.) But the PCs themselves will not change.

Best Answer

Related Solutions

Solved – PCA and diagonalization of the covariance matrix

Solved – PCA on a rank-deficient matrix using SVD of the covariance matrix

1. Any SVD of $X$ determines a unique SVD of $XX^\prime$.

2. Any SVD of $XX^\prime$ gives (at least one) SVD of $X$.

Related Question