Solved – How to select landmarks of Kernel to run SVM

kernel tricksvm

Objective

Clarify how to choose the kernel reference points (landmarks) to identify the non-linear boundary.

Background

Going through SVM at Coursera ML – Support Vector Machine and trying to understand how to choose the landmarks to measure the distances to feed into the Gaussian Kernel.

It says "put the landmarks in the exact same locations as all the training examples".

Question

Not clear why "the exact same locations of all the training data".

Why using all the data?
Number of features M and the number of data N is different and I suppose M << N. Then should we choose M number of data to use landmarks?
Why not considering if a data to use as a landmark is classified positive or negative?
I believe we would like to distinguish positive data (higher Gaussian probability), then why use the negative data as well as the landmarks?

In the YouTube SVM with polynomial kernel visualization example (although it does not use Gaussian), the landmarks should be those that represent red points?

Best Answer

M is the number of data points, not the number of features. So we take all our (training) data, and for each (xi,yi), we get a landmark.

Notice that in the combined minimisation term, each fi is combined with its matching yi, so the minimisation takes account of which landmarks should be positive and which should be negative.

In the video each red dot AND each blue dot should be a landmark.

Related Solutions

Solved – Kernel logistic regression vs SVM

KLRs and SVMs

Classification performance is almost identical in both cases.
KLR can provide class probabilities whereas SVM is a deterministic classifier.
KLR has a natural extension to multi-class classification whereas in SVM, there are multiple ways to extend it to multi-class classification (and it is still an area of research whether there is a version which has provably superior qualities over the others).
Surprisingly or unsurprisingly, KLR also has optimal margin properties that the SVMs enjoy (well in the limit at least)!

Looking at the above it almost feels like kernel logistic regression is what you should be using. However, there are certain advantages that SVMs enjoy

KLR is computationally more expensive than SVM - $O(N^3)$ vs $O(N^2k)$ where $k$ is the number of support vectors.
The classifier in SVM is designed such that it is defined only in terms of the support vectors, whereas in KLR, the classifier is defined over all the points and not just the support vectors. This allows SVMs to enjoy some natural speed-ups (in terms of efficient code-writing) that is hard to achieve for KLR.

Solved – Plotting the decision boundary of a kernel SVM (RBF)

I figured out what is needed to be done. Actually, it's something simple, but its seems I had a matlaboid bug... Here is the code and the resulting figure for the "XOR" binary classification problem.

gamma     = getGamma();
b         = getB();
points_x1 = linspace(xLimits(1), xLimits(2), 100);
points_x2 = linspace(yLimits(1), yLimits(2), 100);
[X1, X2]  = meshgrid(points_x1, points_x2);

% Initialize f
f = ones(length(points_x1), length(points_x2))*rho;

% Iter. all SVs
for i=1:N_sv
    alpha_i = getAlpha(i);
    sv_i    = getSV(i);
    for j=1:length(points_x1)
        for k=1:length(points_x2)
            x = [points_x1(j);points_x2(k)];
            f(j,k) = f(j,k) + alpha_i*y_i*kernel_func(gamma, x, sv_i);
        end
    end    
end

surf(X1,X2,f);
shading interp;
lighting phong;
alpha(.6)

contourf(X1, X2, f, 1);

where the function

function k = kernel_func(gamma, x, x_i)
    k = exp(-gamma*norm(x - x_i)^2);
end

just produces the kernel function (RBF kernel), $k(\mathbf{x},\mathbf{x}')=\operatorname{exp}\left(-\gamma\lVert\mathbf{x}-\mathbf{x}'\rVert^2\right)$.

Here is the result for the XOR problem. Here $\gamma=4$.

enter image description here

Related Question