Solved – Optimizing a Support Vector Machine with Quadratic Programming

optimizationrsvm

I'm trying to understand the process for training a linear support vector machine. I realize that properties of SMVs allow them to be optimized much quicker than by using a quadratic programming solver, but for learning purposes I'd like to see how this works.

Training Data

set.seed(2015)
df <- data.frame(X1=c(rnorm(5), rnorm(5)+5), X2=c(rnorm(5), rnorm(5)+3), Y=c(rep(1,5), rep(-1, 5)))
df
           X1       X2  Y
1  -1.5454484  0.50127  1
2  -0.5283932 -0.80316  1
3  -1.0867588  0.63644  1
4  -0.0001115  1.14290  1
5   0.3889538  0.06119  1
6   5.5326313  3.68034 -1
7   3.1624283  2.71982 -1
8   5.6505985  3.18633 -1
9   4.3757546  1.78240 -1
10  5.8915550  1.66511 -1

library(ggplot2)
ggplot(df, aes(x=X1, y=X2, color=as.factor(Y)))+geom_point()

enter image description here

Finding the Maximum Margin Hyperplane

According to this Wikipedia article on SVMs, to find the maximum margin hyperplane I need to solve

$$
\arg\min_{(\mathbf{w},b)}\frac{1}{2}\|\mathbf{w}\|^2
$$
subject to (for any i = 1, …, n)
$$
y_i(\mathbf{w}\cdot\mathbf{x_i} – b) \ge 1.
$$

How do I 'plug' my sample data into a QP solver in R (for instance quadprog) to determine $\mathbf{w}$?

Best Answer

HINT:

Quadprog solves the following:

$$ \begin{align*} \min_x d^T x + 1/2 x^T D x\\ \text{such that }A^T x \geq x_0 \end{align*} $$

Consider $$ x = \begin{pmatrix} w\\ b \end{pmatrix} \text{and } D=\begin{pmatrix} I & 0\\ 0 & 0 \end{pmatrix} $$

where $I$ is the identity matrix.

If $w$ is $p \times 1$ and $y$ is $n \times 1$:

$$ \begin{align*} x &: (2p+1) \times 1 \\ D &: (2p+1) \times (2p+1) \end{align*} $$

On similar lines: $$ x_0 = \begin{pmatrix} 1\\ 1 \end{pmatrix}_{n \times 1} $$

Formulate $A$ using the hints above to represent your inequality constraint.

Related Question