Solved – Where does the definition of the hyperplane in a simple SVM come from

discriminant analysisgeometrymachine learningsvm

I'm trying to figure out support vector machines using this resource. On page 2 it is stated that for linearly separable data the SVM problem is to select a hyperplane such that $\vec{x}_i\vec{w} + b \geq 1$ for $y_i \in 1$ and $\vec{x}_i\vec{w} + b \leq -1$ for $y_i \in -1$. I'm having trouble to understand where the right-hand side of the constraints come from?

P.S The next question would be how to show that the SVM's margin is equal to $\frac{1}{||\vec{w}||}$.

Best Answer

Essentially these two constraints basically require the training data to be correctly classified, and at least a certain distance from the decision threshold 0. The hyperplane that fulfils these constraints with the smallest norm of the weights will have the maximal margin. The value $\pm 1$ is essentially arbitrary, you could replace it with $\pm$ any value you like and it would merely rescale the coefficients of the hyper-plane, but without changing the decision boundary. A value of 1 is used just to keep the maths neat.