Kernel PCA Projection – How to Project a New Vector onto the PC Space

dimensionality reductionkernel trickpca

Let $X_{N \times d}$ be the data matrix, where $N$ is the number of samples and $d$ the size of the features space.

Using kernel PCA (kPCA), one first computes a kernel matrix $K_{N \times N}$, and then, after its eigenvectors $E_{N \times N}$ have been computed, it is possible to project the data onto the first $c \leq N$ components as: $$X_\mathrm{projected} = KE_c,$$ where $E_c$ denotes first $c$ columns of $E$. Equivalently, in Matlab notation:

Projected_data = K*E(:,1:c);

The new projected data have now size ${N \times c} $.

I would like to know if it is possible to project an unseen data vector $x_{1 \times d} $ onto the previously computed principal components $E$. If it is possible, what is the correct procedure?

Best Answer

Let's consider the training dataset first. Principal components (sometimes called PC "scores") are the centered data projected onto the principal axes. In kPCA, eigenvectors of the kernel matrix directly give you principal components, but scaled to have unit sum-of-squares. To get the correct scaling, one needs to multiply them by the square roots of the respective eigenvalues, so $$\mathbf X_\mathrm{projected\:on\: axis \:\#i} = \sqrt{\lambda_i} \mathbf E_i,$$ where $\mathbf E_i$ and $\lambda_i$ are the $i$-th eigenvector and eigenvalue of the kernel matrix: $\mathbf K \mathbf E_i = \lambda_i \mathbf E_i$. You wrote it wrong in your question.

This is easy to see by considering standard, non-kernel, PCA. Let $\mathbf X$ be the $n\times p$ centered data matrix. PCA amounts to an SVD decomposition of the (centered) data matrix: $$\mathbf X = \mathbf U \mathbf S \mathbf V^\top,$$ where $\mathbf U \mathbf S$ are PCs (PC "scores") and $\mathbf V$ are principal axes. Usually PCA is introduced via eigen-decomposition of the covariance matrix: $\mathbf C = \mathbf X^\top\mathbf X/n$, which has $\mathbf V$ as eigenvectors. Alternatively, one can consider the so called Gram matrix $$\mathbf K = \mathbf X \mathbf X^\top = \mathbf U \mathbf S^2 \mathbf U^\top,$$ which has $\mathbf U$ as eigenvectors and $\mathbf S^2$ as eigenvalues. To get PCs $\mathbf{US}$ one needs to multiply eigenvectors $\mathbf U$ with the square roots of the eigenvalues.

The kernel matrix in kPCA is what I called Gram matrix above. So the bottomline is: multiply its eigenvectors with the square roots of its eigenvalues.

Turning now to your main question, you have a new (test) data point $\mathbf x$ (a row vector) that needs to be projected on the principal axes. When faced with a question about kPCA, always think about how you would do it in standard PCA. You need to compute $\mathbf x \mathbf V$. But say you don't know $\mathbf V$ (that's the case in kPCA). Well, you can compute $\mathbf k = \mathbf x \mathbf X^\top$, which is a (row) vector of kernels between the new data point and all the old ones. And now $$\mathbf x \mathbf V = \mathbf x \mathbf V \mathbf S \mathbf U^\top \mathbf U \mathbf S^{-1} = \mathbf x \mathbf X^\top \mathbf U \mathbf S^{-1} = \mathbf k \mathbf U \mathbf S^{-1}.$$ Rewriting this in your kPCA notation, $$\mathbf x_\mathrm{projected} = \mathbf k \mathbf E \boldsymbol \Lambda^{-1/2},$$ where $\boldsymbol \Lambda$ is the diagonal matrix with eigenvalues $\lambda_i$ on the diagonal.

Best Answer

Related Solutions

PCA – Advantages of Kernel PCA Over Standard PCA

Solved – What exactly is the procedure to compute principal components in kernel PCA

Related Question