Solved – What are conditions to apply the “transpose trick” in PCA

pca

The "transpose trick" in principal component analysis (PCA) involves replacing the computation of $\mathrm{eig}(A^TA)$ by $\mathrm{eig}(AA^T)$, as it is known that if $v$ is an eigenvector of $AA^T$, then $A^Tv$ is an eigenvector of $A^TA$ – compare https://stats.stackexchange.com/a/7144/77888.

However, I fail to see under which conditions that trick actually applies to PCA, as PCA "usually" involves "a normalization step of the initial data" that "consists of mean centering" (Wikipedia).

The Stats link above simply says that

for simplicity, suppose that the columns of A have already been normalized to have zero mean,

while Fundamentals of Statistics says nothing about that whatsoever – despite explicitly stating that

the PCA is based on the solution of the eigenvalue problem for the covariance matrix

and stating that in the covariance matrix,

the mean of each variable is subtracted before multiplication.

So, what conditions need to be fulfilled for the transpose trick to work be directly applicable in PCA of a generally non-normalized matrix? That is, when is $\mathrm{eig}(\mathrm{cov}(A))$ related to $\mathrm{eig}(\mathrm{cov}(A^T))$?

My motivation for this question stems from the fact that I have seen someone use [v,~]=eig(cov(data')); v(:,end) (instead of [v,~]=eig(cov(data)); data * v(:, end)) in MATLAB to project the data onto the principal component associated with the largest eigenvalue, and I have the strong feeling that this substitution is valid (meaning, I get the same result up to some factor) only if the data has zero-mean rows and columns [because both data and data' are used in cov].

Best Answer

The 'transpose trick' is a general fact of linear algebra, so it always holds. But, transposing the data matrix before estimating the covariance matrix (as in the Matlab code you quoted) is not the proper way to use this fact to do PCA. This has to do with the centering issues you mentioned.

What goes wrong

Suppose the data matrix is $A$, with $n$ observations on the rows and $p$ features on the columns. Let $A_c$ denote the centered data matrix, obtained by centering the columns of $A$. The sample covariance matrix is

$$C = \frac{1}{n} A_c^T A_c$$

But, if you transpose $A$ before computing the sample covariance matrix, you end up with:

$$C_* = \frac{1}{p} A_{c*} A_{c*}^T$$

where $A_{c*}$ is the result of centering the rows of $A$. Notice that $C$ and $C_*$ don't have the relationship needed to use the transpose trick, unless $A_c = A_{c*}$ (as happens when the row and column means are both zero, for example). And, in the latter case, the eigenvalues of $C_*$ will still differ from those of $C$ by a constant factor of $\frac{n}{p}$.

The following procedure should be used instead.

How to do PCA using the transpose trick

Given $n \times p$ data matrix $A$ (with $p > n$, otherwise this is just wasted computation):

  1. Center the columns of $A$ to obtain the centered data matrix $A_c$.

  2. Let $G = \frac{1}{n} A_c A_c^T$. Note that $G$ and $C$ (above) do have the proper relationship to use the transpose trick.

  3. Let $V \Lambda V^T$ be the eigendecomposition of $G$ (with eigenvectors on the columns of $V$ and eigenvalues on the diagonal of $\Lambda$).

  4. Let $\tilde{V} = A_c^T V$

The diagonal of $\Lambda$ and the columns of $\tilde{V}$ are equal to the top $n$ eigenvalues and eigenvectors of the sample covariance matrix $C$. Any remaining eigenvalues of $C$ are zero. It's a good idea to scale the columns of $\tilde{V}$ to have unit norm, as they won't generally be normalized to begin with.