Solved – How to apply a Gaussian radial basis function kernel PCA to nonlinear data

kernel trickpcapythonscikit learn

I have an assignment to implement a Gaussian radial basis function-kernel principal component analysis (RBF-kernel PCA) and have some challenges here. It would be great if someone could point me to the right direction because I am obviously doing something wrong here.

So, when I understand correctly, the RBF kernel is implemented like this:

$$K(\mathbf{x}_i, \mathbf{x}_j) = \mathrm{exp}\left(- \gamma \|\mathbf{x}_i – \mathbf{x}_j\|^{2}_{2} \right)=\mathrm{exp}\left(- \frac{\|\mathbf{x}_i – \mathbf{x}_j\|^{2}_{2}}{2\sigma^2} \right),$$

where $\|\mathbf{x}_i – \mathbf{x}_j\|^{2}_{2} = \sum_j(x_{ik} – x_{jk})^2$ is the squared Euclidean distance between two data points, $\mathbf{x}_i$ and $\mathbf{x}_j$, and $\gamma$ is a free parameter $\gamma = \frac{1}{2\sigma^2}$. The $\sigma^2$ can be chosen as the variance of the Euclidean distances between all pairs of data points.

To compare my approach to scikit-learn's implementation, I created a simple nonlinear dataset:

Example dataset

import matplotlib.pyplot as plt

from sklearn.datasets import make_moons
X, y = make_moons(n_samples=100, random_state=123)

plt.figure(figsize=(8,6))

plt.scatter(X[y==0, 0], X[y==0, 1], color='red')
plt.scatter(X[y==1, 0], X[y==1, 1], color='blue')

plt.title('A nonlinear 2Ddataset')
plt.ylabel('y coordinate')
plt.xlabel('x coordinate')

enter image description here

scikit-learn RBF Kernel PCA

When I used the scikit-learn implementation for dimensionality reduction onto 1 component axis, the classes separate quite nicely.

scikit_kpca = KernelPCA(n_components=1, kernel='rbf', gamma=15)
X_skernpca = scikit_kpca.fit_transform(X)

plt.figure(figsize=(8,6))
plt.scatter(X_skernpca[y==0, 0], np.zeros((50,1)), color='red', alpha=0.5)
plt.scatter(X_skernpca[y==1, 0], np.zeros((50,1)), color='blue', alpha=0.5)

plt.title('First component after RBF Kernel PCA')
plt.show()

enter image description here

My approach

Somehow, I am not able to reproduce those results. From what I understand, I have to compute all pairwise distances in order to compute the kernel. Then center the Kernel and extract the eigenvector that corresponds to the largest eigenvalue. This is what I have done so far:

from sklearn.preprocessing import KernelCenterer
from scipy.spatial.distance import pdist, squareform
from scipy import exp


# pdist to calculate the squared Euclidean distances for every pair of points
# in the 100x2 dimensional dataset.
sq_dists = pdist(X, 'sqeuclidean')

# Variance of the Euclidean distance between all pairs of data points.
variance = np.var(sq_dists)

# squareform to converts the pairwise distances into a symmetric 100x100 matrix
mat_sq_dists = squareform(sq_dists)

# set the gamma parameter equal to the one I used in scikit-learn KernelPCA
gamma = 15

# Compute the 100x100 kernel matrix
K = exp(gamma * mat_sq_dists)

# Center the kernel matrix
kern_cent = KernelCenterer()
K = kern_cent.fit_transform(K)

# Get the eigenvector with largest eigenvalue
eigvals, eigvecs = np.linalg.eig(K)
eigvals, eigvecs = zip(*sorted(zip(eigvals, eigvecs), reverse=True))
X_pc1 = eigvecs[0]

enter image description here

Edit

Thanks a lot to @Kirill ! He found my mistakes and the problem is solved now! Here is the correct version for future reference:

from sklearn.preprocessing import KernelCenterer
from scipy.spatial.distance import pdist, squareform
from scipy import exp
from scipy.linalg import eigh


# pdist to calculate the squared Euclidean distances for every pair of points
# in the 100x2 dimensional dataset.
sq_dists = pdist(X, 'sqeuclidean')

# Variance of the Euclidean distance between all pairs of data points.
variance = np.var(sq_dists)

# squareform to converts the pairwise distances into a symmetric 100x100 matrix
mat_sq_dists = squareform(sq_dists)

# set the gamma parameter equal to the one I used in scikit-learn KernelPCA
gamma = 15

# Compute the 100x100 kernel matrix
K = exp(-gamma * mat_sq_dists)

# Center the kernel matrix
kern_cent = KernelCenterer()
K = kern_cent.fit_transform(K)

# Get eigenvalues in ascending order with corresponding 
# eigenvectors from the symmetric matrix
eigvals, eigvecs = eigh(K)

# Get the eigenvectors that corresponds to the highest eigenvalue
X_pc1 = eigvecs[:,-1]

enter image description here

Best Answer

The first problem seems to be that the sign of gamma is wrong (it should be negative: $-15$, as in the definition of the kernel, not as in your code). Alternatively, use exp(-gamma * mat_sq_dists).

The second problem is that you clobber the eigenvectors with your invocation of zip's when you sort the list. The $i$-th eigenvector is eigvecs[:,i], not eigvecs[i,:], according to scipy.linalg.eigh (also: you should prefer eigh to eig because you have a symmetric real matrix).

Replace

< gamma = 15
> gamma = -15

and (to get ordered, real eigenvalues)

< eigvals, eigvecs = np.linalg.eig(K)
> eigvals, eigvecs = scipy.linalg.eigh(K)

and

< eigvals, eigvecs = zip(*sorted(zip(eigvals, eigvecs), reverse=True))
< X_pc1 = eigvecs[0]
> X_pc1 = eigvecs[:,99]

Finally, you can examine scikit-learn's own implementation here.

Related Solutions

Solved – How to prove that the radial basis function is a kernel

Zen used method 1. Here is method 2: Map $x$ to a spherically symmetric Gaussian distribution centered at $x$ in the Hilbert space $L^2$. The standard deviation and a constant factor have to be tweaked for this to work exactly. For example, in one dimension,

$$ \int_{-\infty}^\infty \frac{\exp[-(x-z)^2/(2\sigma^2)]}{\sqrt{2 \pi} \sigma} \frac{\exp[-(y-z)^2/(2 \sigma^2)}{\sqrt{2 \pi} \sigma} dz = \frac{\exp [-(x-y)^2/(4 \sigma^2)]}{2 \sqrt \pi \sigma}. $$

So, use a standard deviation of $\sigma/\sqrt 2$ and scale the Gaussian distribution to get $k(x,y) = \langle \Phi(x), \Phi(y)\rangle$. This last rescaling occurs because the $L^2$ norm of a normal distribution is not $1$ in general.

Solved – Plotting the decision boundary of a kernel SVM (RBF)

I figured out what is needed to be done. Actually, it's something simple, but its seems I had a matlaboid bug... Here is the code and the resulting figure for the "XOR" binary classification problem.

gamma     = getGamma();
b         = getB();
points_x1 = linspace(xLimits(1), xLimits(2), 100);
points_x2 = linspace(yLimits(1), yLimits(2), 100);
[X1, X2]  = meshgrid(points_x1, points_x2);

% Initialize f
f = ones(length(points_x1), length(points_x2))*rho;

% Iter. all SVs
for i=1:N_sv
    alpha_i = getAlpha(i);
    sv_i    = getSV(i);
    for j=1:length(points_x1)
        for k=1:length(points_x2)
            x = [points_x1(j);points_x2(k)];
            f(j,k) = f(j,k) + alpha_i*y_i*kernel_func(gamma, x, sv_i);
        end
    end    
end

surf(X1,X2,f);
shading interp;
lighting phong;
alpha(.6)

contourf(X1, X2, f, 1);

where the function

function k = kernel_func(gamma, x, x_i)
    k = exp(-gamma*norm(x - x_i)^2);
end

just produces the kernel function (RBF kernel), $k(\mathbf{x},\mathbf{x}')=\operatorname{exp}\left(-\gamma\lVert\mathbf{x}-\mathbf{x}'\rVert^2\right)$.

Here is the result for the XOR problem. Here $\gamma=4$.

enter image description here

Example dataset

scikit-learn RBF Kernel PCA

My approach

Edit

Best Answer

Related Solutions

Solved – How to prove that the radial basis function is a kernel

Solved – Plotting the decision boundary of a kernel SVM (RBF)

Related Question