Solved – Large Margin Classifier with CVXOPT

machine learningpythonsvm

I need to generate a Large Margin Classifier using python library cvxopt which allows me to solve the quadratic program.

I am trying to write a python function to take the training data and some test data and return the support vectors and the distance of each test data point from the optimal hyperplane.
I am struggling, however, to understand the output of the cvxopt and how to use it to find the support vectors and then the distances.

My function so far looks like this:

def lmc(X, t, Xtest):

    numSamples = len(X)

    P = matrix(outer(t,t) * inner(X,X))
    q = matrix(ones(numSamples) * -1)

    G = matrix(diag(ones(numSamples) * -1))
    h = matrix(zeros(numSamples))

    A = matrix(t, (1,numSamples))
    b = matrix(0.0)

    solution = solvers.qp(P, q, G, h, A, b)

    return solution

The output of this function so far includes large vectors: s, x, z and a single value for y. My knowledge of support vector machines is pretty sketchy so any help would be greatly appreciated.

Best Answer

Using the notation and steps provided by Tristan Fletcher the general steps to solve the SVM problem are the following. You can also see the full article and code in python here:

  • Create $\mathbf{H}$ where $H_{i,j} = y^{(i)}y^{(j)} <x^{(i)} x^{(j)}> $. Note this is matrix $P$ in the CVXOPT API
  • Calculate $\mathbf{w} = \sum_i^m y^{(i)} \alpha_i x^{(i)}$
  • Determine the set of support vectors $S$ by finding the indices such that $\alpha_i > 0$
  • Calculate the intercept term using $b = y^{(s)} - \sum_{m \in S} \alpha_m y^{(m)} <x^{(m)} x^{(s)} >$
  • For each new point $x'$ classify according to $y' = sign(w^T x' + b)$

Re-writing the problem in an appropriate format

Since we will solve this optimization problem using the CVXOPT library in python we will need to match the solver's API which, according to the documentation is of the form:

\begin{aligned} & \min \frac{1}{2} x^TPx + q^Tx \\ s.t. \ & \ Gx \leq h \\ & \ Ax = b \end{aligned}

With API

cvxopt.solvers.qp(P, q[, G, h[, A, b[, solver[, initvals]]]])

Recall that the dual problem is expressed as:

$$ \max_{\alpha} \sum_i^m \alpha_i - \frac{1}{2} \sum_{i,j}^m y^{(i)}y^{(j)} \alpha_i \alpha_j <x^{(i)} x^{(j)}> $$

Let $\mathbf{H}$ be a matrix such that $H_{i,j} = y^{(i)}y^{(j)} <x^{(i)} x^{(j)}> $, then the optimization becomes:

\begin{aligned} & \max_{\alpha} \sum_i^m \alpha_i - \frac{1}{2} \alpha^T \mathbf{H} \alpha \\ s.t. & \ \alpha_i \geq 0 \\ & \ \sum_i^m \alpha_i y^{(i)} = 0 \end{aligned}

We convert the sums into vector form and multiply both the objective and the constraint by $-1$ which turns this into a minimization problem and reverses the inequality

\begin{aligned} & \min_{\alpha} \frac{1}{2} \alpha^T \mathbf{H} \alpha - 1^T \alpha \\ & s.t. \ - \alpha_i \leq 0 \\ & s.t. \ y^T \alpha = 0 \end{aligned}

We are now ready to convert our numpy arrays into the cvxopt format, using the same notation as in the documentation this gives

  • $P := H$ a matrix of size $m \times m$
  • $q := - \vec 1$ a vector of size $m \times 1$
  • $G : = - diag[1]$ a diagonal matrix of -1s of size $m \times m$
  • $h : = \vec 0$ a vector of zeros of size $m \times 1$
  • $A := y$ the label vector of size $m \times 1$
  • $b := 0$ a scalar

Note that in the simple example of $m = 2$ the matrix $G$ and vector $h$ which define the constraint are

$G = \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix} $ and $h = \begin{bmatrix} 0 \\ 0 \end{bmatrix} $

Computing the matrix $\mathbf{H}$ in vectorized form

Consider the simple example with 2 input samples $\{x^{(1)}, x^{(2)}\} \in \mathbb{R}^2$ which are two dimensional vectors. i.e. $x^{(1)} = (x_1^{(1)} , x_2^{(1)})^T$

$$ X = \begin{bmatrix} x_1^{(1)} & x_2^{(1)} \\ x_1^{(2)} & x_2^{(2)} \end{bmatrix} \ \ \ y = \begin{bmatrix} y^{(1)} \\ y^{(2)} \end{bmatrix} $$

We now proceed to creating a new matrix $X'$ where each input sample $x$ is multiplied by the corresponding output label $y$. This can be done easily in Numpy using vectorization and padding.

$$X' = \begin{bmatrix} x^{(1)}_1 y^{(1)} & x^{(1)}_2y^{(1)} \\ x^{(2)}_1y^{(2)} & x^{(2)}_2y^{(2)} \end{bmatrix} $$

Finally we take the matrix multiplication of $X'$ and its transpose giving $H = X' X'^T$

$$H = X' @ X'^T = \begin{bmatrix} x^{(1)}_1 y^{(1)} & x^{(1)}_2y^{(1)} \\ x^{(2)}_1y^{(2)} & x^{(2)}_2y^{(2)} \end{bmatrix} \begin{bmatrix} x^{(1)}_1 y^{(1)} & x^{(2)}_1 y^{(2)} \\ x^{(1)}_2y^{(1)} & x^{(2)}_2y^{(2)} \end{bmatrix} $$

$$ H = \begin{bmatrix} x^{(1)}_1 x^{(1)}_1y^{(1)}y^{(1)} + x^{(1)}_2x^{(1)}_2y^{(1)}y^{(1)} & x^{(1)}_1 x^{(2)}_1y^{(1)}y^{(2)} + x^{(1)}_2x^{(2)}_2y^{(1)}y^{(2)} \\ x^{(2)}_1 x^{(1)}_1y^{(2)}y^{(1)} + x^{(2)}_2x^{(1)}_2y^{(2)}y^{(1)} & x^{(2)}_1 x^{(2)}_1y^{(2)}y^{(2)} + x^{(2)}_2x^{(2)}_2y^{(2)}y^{(2)} \end{bmatrix}$$

Implementation in Python

Toy data set

#Data set
x_neg = np.array([[3,4],[1,4],[2,3]])
y_neg = np.array([-1,-1,-1])
x_pos = np.array([[6,-1],[7,-1],[5,-3]])
y_pos = np.array([1,1,1])
x1 = np.linspace(-10,10)
x = np.vstack((np.linspace(-10,10),np.linspace(-10,10)))

#Data for the next section
X = np.vstack((x_pos, x_neg))
y = np.concatenate((y_pos,y_neg))

CVXOPT solver and resulting $\alpha$

#Importing with custom names to avoid issues with numpy / sympy matrix
from cvxopt import matrix as cvxopt_matrix
from cvxopt import solvers as cvxopt_solvers

#Initializing values and computing H. Note the 1. to force to float type
m,n = X.shape
y = y.reshape(-1,1) * 1.
X_dash = y * X
H = np.dot(X_dash , X_dash.T) * 1.

#Converting into cvxopt format
P = cvxopt_matrix(H)
q = cvxopt_matrix(-np.ones((m, 1)))
G = cvxopt_matrix(-np.eye(m))
h = cvxopt_matrix(np.zeros(m))
A = cvxopt_matrix(y.reshape(1, -1))
b = cvxopt_matrix(np.zeros(1))

#Setting solver parameters (change default to decrease tolerance) 
cvxopt_solvers.options['show_progress'] = False
cvxopt_solvers.options['abstol'] = 1e-10
cvxopt_solvers.options['reltol'] = 1e-10
cvxopt_solvers.options['feastol'] = 1e-10

#Run solver
sol = cvxopt_solvers.qp(P, q, G, h, A, b)
alphas = np.array(sol['x'])

Compute $w$ and $b$ parameters

#w parameter in vectorized form
w = ((y * alphas).T @ X).reshape(-1,1)

#Selecting the set of indices S corresponding to non zero parameters
S = (alphas > 1e-4).flatten()

#Computing b
b = y[S] - np.dot(X[S], w)

#Display results
print('Alphas = ',alphas[alphas > 1e-4])
print('w = ', w.flatten())
print('b = ', b[0])

 Result

SVM simple

Source:

Full code here

Related Question