Solved – Difference between a SVM and a perceptron

kernel trickmachine learningsvm

I am a bit confused with the difference between an SVM and a perceptron. Let me try to summarize my understanding here, and please feel free to correct where I am wrong and fill in what I have missed.

  1. The Perceptron does not try to optimize the separation "distance". As long as it finds a hyperplane that separates the two sets, it is good. SVM on the other hand tries to maximize the "support vector", i.e., the distance between two closest opposite sample points.

  2. The SVM typically tries to use a "kernel function" to project the sample points to high dimension space to make them linearly separable, while the perceptron assumes the sample points are linearly separable.

Best Answer

It sounds right to me. People sometimes also use the word "Perceptron" to refer to the training algorithm together with the classifier. For example, someone explained this to me in the answer to this question. Also, there is nothing to stop you from using a kernel with the perceptron, and this is often a better classifier. See here for some slides (pdf) on how to implement the kernel perceptron.

The major practical difference between a (kernel) perceptron and SVM is that perceptrons can be trained online (i.e. their weights can be updated as new examples arrive one at a time) whereas SVMs cannot be. See this question for information on whether SVMs can be trained online. So, even though a SVM is usually a better classifier, perceptrons can still be useful because they are cheap and easy to re-train in a situation in which fresh training data is constantly arriving.