Solved – What exactly is the procedure to compute principal components in kernel PCA

dimensionality reductioneigenvalueskernel trickmanifold-learningpca

In kernel PCA (principal component analysis) you first choose a desired kernel, use it to find your $K$ matrix, center the feature space via the $K$ matrix, find its eigenvalues and eigenvectors, then multiply the centered kernel matrix by the desired eigenvectors corresponding to the largest eigenvalues.

The result should be the projection of the feature space data onto a low dimensional subspace.

To the best of my knowledge, you divide the eigenvalues by the number $n$ of original data points to scale them. So the question is, do the eigenvectors you choose to multiply the centered kernel matrix by need scaling as well and if so how do you do it?

Best Answer

To find PCs in classical PCA, one can perform singular value decomposition of the centred data matrix (with variables in columns) $\mathbf X = \mathbf U \mathbf S \mathbf V^\top$; columns of $\mathbf U \mathbf S$ are called principal components (i.e. projections of the original data onto the eigenvectors of the covariance matrix). Observe that the so called Gram matrix $\mathbf G = \mathbf X \mathbf X^\top = \mathbf U \mathbf S^2 \mathbf U^\top$ has eigenvectors $\mathbf U$ and eigenvalues $\mathbf S^2$, so another way to compute principal components is to scale eigenvectors of the Gram matrix by the square roots of the respective eigenvalues.

In full analogy, here is a complete algorithm to compute kernel principal components:

  1. Choose a kernel function $k(\mathbf x, \mathbf y)$ that conceptually is a scalar product in the target space.

  2. Compute a Gram/kernel matrix $\mathbf K$ with $K_{ij} = k(\mathbf x_{(i)}, \mathbf x_{(j)})$.

  3. Center the kernel matrix via the following trick: $$\mathbf K_\mathrm{centered} = \mathbf K - \mathbf 1_n \mathbf K - \mathbf K \mathbf 1_n + \mathbf 1_n \mathbf K \mathbf 1_n=(\mathbf I - \mathbf 1_n)\mathbf K(\mathbf I - \mathbf 1_n) ,$$ where $\mathbf 1_n$ is a $n \times n$ matrix with all elements equal to $\frac{1}{n}$, and $n$ is the number of data points.

  4. Find eigenvectors $\mathbf U$ and eigenvalues $\mathbf S^2$ of the centered kernel matrix. Multiply each eigenvector by the square root of the respective eigenvalue.

  5. Done. These are the kernel principal components.

Answering your question specifically, I don't see any need of scaling either eigenvectors or eigenvalues by $n$ in steps 4--5.

A good reference is the original paper: Scholkopf B, Smola A, and Müller KR, Kernel principal component analysis, 1999. Note that it presents the same algorithm in a somewhat more complicated way: you are supposed to find eigenvectors of $K$ and then multiply them by $K$ (as you wrote in your question). But multiplying a matrix and its eigenvector results in the same eigenvector scaled by the eigenvalue (by definition).

Related Question