Solved – How are SVMs = Template Matching

deep learningkernel trickmachine learningneural networkssvm

I read about SVMs and learnt that they are solving an optimization problem and max margin idea was very reasonable.

Now, using kernels they can find even non-linear separation boundaries which was great.

So far, I really do not have any idea how SVMs (a special kernel machine) and kernel machines are related to neural networks?

Consider the comments by Yann Lecun => here:

kernel methods were a form of glorified template matching

and here too:

For example, some people were dazzled by kernel methods because of the
cute math that goes with it. But, as I’ve said in the past, in the
end, kernel machines are shallow networks that perform “glorified
template matching”. There is nothing wrong with that (SVM is a great
method), but it has dire limitations that we should all be aware of.

So my questions are:

  1. How is SVM related to neural network? How is it a shallow network?
  2. SVM solves an optimization problem with a well defined objective function, how is it doing template matching? What is the template here to which an input is matched?

I guess these comments need a thorough understanding of high dimensional spaces, neural nets and kernel machines but so far I have been trying and couldn't grasp the logic behind it. But it is surely interesting to note the connections between two very very different ml techniques.

EDIT: I think understanding SVMs from a Neural perspective would be great. I am looking for a thorough mathematics backed answer to the above two questions, so as to really understand the link between SVMs and Neural Nets, both in the case of linear SVM and SVMs with the kernel trick.

Best Answer

  1. How is SVM related to neural network? How is it a shallow network?

The SVM is a single layer neural network with the hinge loss as loss function and exclusively linear activation. The concept has been alluded in previous threads, such as this one: Single layer NeuralNetwork with RelU activation equal to SVM?

  1. SVM solves an optimization problem with a well defined objective function, how is it doing template matching? What is the template here to which an input is matched?

The Gram Matrix (Kernel Matrix, if you prefer) is a measure of similarity. As the SVM allows sparse solutions, prediction becomes a matter of comparing your sample with the templates, i.e. the support vectors.