Solved – Relationship between eigenvectors of $\frac{1}{N}XX^\top$ and $\frac{1}{N}X^\top X$ in the context of PCA

linear algebrapca

In Christopher Bishop's book Pattern Recognition and Machine Learning, the section on PCA contains the following:

Given a centred data matrix $\mathbf{X}$ with covariance matrix $N^{-1}\mathbf{X}^T\mathbf{X}$, the eigenvector equation is:

$$N^{-1}\mathbf{X}^T\mathbf{X} \mathbf{u}_i = \lambda_i \mathbf{u}_i.$$

Defining $\mathbf{v}_i = \mathbf{X} \mathbf{u}_i$, Bishop claims that if $\mathbf{u}_i$ and $\mathbf{v}_i$ have unit length, then:

$$\mathbf{u}_i = \frac{1}{(N\lambda_i)^{\frac{1}{2}}}\mathbf{X}^T\mathbf{v}_i.$$

Where does the square root come from?


EDIT:

In particular, why is the following invalid:

$\frac{1}{N}\mathbf{X}^T\mathbf{X}\mathbf{u}_i = \lambda \mathbf{u}_i$

$\Rightarrow \frac{1}{N}\mathbf{X}^T \mathbf{v}_i = \lambda \mathbf{u}_i$ using $\mathbf{v}_i = \mathbf{Xu}_i$

$\Rightarrow \frac{1}{N\lambda_i}\mathbf{X}^T \mathbf{v}_i = \mathbf{u}_i$

The same result, but without the square root.

Best Answer

This refers to the short section 12.1.4 PCA for high-dimensional data in Bishop's book. I can see that this section can be a bit confusing, because Bishop is going back and forth between $\newcommand{\X}{\mathbf X}\newcommand{\v}{\mathbf v}\newcommand{\u}{\mathbf u}\v_i$ and $\u_i$ using a slightly inconsistent notation.

The section is about the relationship between the eigenvectors of covariance matrix $\frac{1}{N}\X^\top \X$ and the eigenvectors of the Gram matrix $\frac{1}{N}\X \X^\top$ (in the context of PCA). Let $\v_i$ be a unit-length eigenvector of $\frac{1}{N}\X \X^\top$:

$$\frac{1}{N}\X \X^\top \v_i = \lambda_i \v_i.$$

If we multiply this equation by $\X^\top$ from the left:

$$\frac{1}{N}\X^\top \X (\X^\top \v_i) = \lambda_i (\X^\top \v_i),$$

we see that $\X^\top \v_i$ is an eigenvector of $\frac{1}{N}\X^\top \X$.

However, it will not have unit length! Indeed, let us compute its length: $$\|\X^\top \v_i\|^2=(\X^\top \v_i)^\top \X^\top \v_i = \v_i^\top \X\X^\top \v_i=\v_i(N\lambda v_i)=N\lambda\|\v_i\|^2=N\lambda_i.$$ So the squared length of $\X^\top \v_i$ is equal to $N\lambda_i$. Therefore, if we want to transform $\v_i$ into a unit-length covariance matrix eigenvector $\u_i$, we need to normalize it have unit length: $$\u_i = \frac{1}{(N\lambda_i)^{1/2}}\X^\top \v_i.$$

(Please note that the above was not using $\v_i=\X\u_i$ definition that you quoted. Instead, we started directly with a unit-length $\v_i$. I believe this might have been the source of your confusion. Bishop uses $\v_i=\X\u_i$ definition earlier in the section, but it is not relevant anymore for this particular argument.)

Related Question