Solved – PCA on a rank-deficient matrix using SVD of the covariance matrix

pcasvd

I have a high-dimensional data matrix X where sample size is smaller than the variable size. I want to use PCA as a dimensionality reduction method, but I cannot call it directly in R since the matrix X is rank-deficient. I saw below technique in an R code to get the principal component representation from a rank-deficient matrix:

1) Get U from svd(XXT)

2) Get the principal component representation C by solving X = UC

I am new to PCA, and I have no idea how that makes sense. Could someone clarify how those 2 steps is equivalent to applying PCA on X?

Best Answer

PCA amounts to finding and interpreting a singular value decomposition (SVD) of $X$ (or any closely-related matrix obtained by various forms of "standardizing" the rows and/or columns of $X$). There are direct connections between any SVD of $X$ and any SVD of $XX^\prime$. I will exhibit the connections in both directions, by obtaining an SVD of one from an SVD of the other.


1. Any SVD of $X$ determines a unique SVD of $XX^\prime$.

By definition, an SVD of $X$ is a representation in the form

$$X = U\Sigma V^\prime$$

where $\Sigma$ is diagonal with non-negative entries and $U$ and $V$ are orthogonal matrices. Among other things, this implies $V^\prime V = \mathbb{I}$, the identity matrix.

Compute

$$XX^\prime = (U\Sigma V^\prime)(U\Sigma V^\prime)^\prime= (U\Sigma V^\prime)(V\Sigma U^\prime) = U\Sigma V^\prime V \Sigma U^\prime = U\Sigma \mathbb{I} \Sigma U^\prime = U (\Sigma^2) U^\prime.$$

Since $\Sigma^2$ is diagonal with non-negative entries and $U$ is orthogonal, this is an SVD of $XX^\prime$.


2. Any SVD of $XX^\prime$ gives (at least one) SVD of $X$.

Conversely, because $XX^\prime$ is symmetric and positive-definite, by means of an SVD or with the Spectral Theorem it can be diagonalized via an orthogonal transformation $U$ (for which both $UU^\prime$ and $U^\prime U$ are identity matrices):

$$XX^\prime = U\Lambda^2 U^\prime.$$

In this decomposition, $\Lambda$ is an $n\times n$ diagonal matrix ($n$ is the number of rows of $X$) with $r = \text{rank}(X)$ non-zero entries which--without any loss of generality--we may assume are the first $r$ entries $\Lambda_{ii}, i=1,2,\ldots, r$. Define $\Lambda^{-}$ to be the diagonal matrix with entries $1/\Lambda_{ii}, i=1,2,\ldots, r$ (and zeros otherwise). It is a generalized inverse of $\Lambda$. Let

$$Y = \Lambda^{-}U^\prime X$$

and compute

$$YY^\prime = \Lambda^{-} U^\prime XX^\prime U \Lambda^{-} = \Lambda^{-} \Lambda^2 \Lambda^{-} = \mathbb{I}_{r;n}.$$

(I have employed the notation $\mathbb{I}_{r;n}$ for an identity-like $n\times n$ matrix of rank $r$: it has $r$ ones along the diagonal and zeros everywhere else.) This result implies $Y$ must be a block matrix of the form

$$Y = \pmatrix{W^\prime & 0 \\ 0 & 0}$$

with $W$ an $r\times r$ orthogonal matrix. The zeros are matrices of appropriate dimensions to fill out the rest of $Y$ (which has the same dimensions as $X$). Let $p$ be the number of columns of $X$. Certainly $r \le p$. If $r \lt p$, we may extend $W$ to a $p\times p$ orthogonal matrix $V$. This can be done in many ways, but a simple one is to put $W^\prime$ in the upper left block, an identity matrix of dimensions $p-r\times p-r$ in the lower right block, and zeros everywhere else.

Since (by construction) $V^\prime V = \mathbb{I}_p$,

$$U^\prime X V = \Lambda Y V = \Lambda (V^\prime V) = \Lambda.$$

Upon left-multiplying by $U$ and right-multiplying by $V^\prime$ we obtain

$$U\Lambda V^\prime = (UU^\prime) X (VV^\prime) = X,$$

which is an SVD of $X$.

$$XX^\prime = (U\Sigma V^\prime)(U\Sigma V^\prime)^\prime= (U\Sigma V^\prime)(V\Sigma U^\prime) = U\Sigma V^\prime V \Sigma U^\prime = U\Sigma \mathbb{I} \Sigma U^\prime = U (\Sigma^2) U^\prime.$$

Since $\Sigma^2$ is diagonal with non-negative entries and $U$ is orthogonal, this is an SVD of $XX^\prime$.