# SVM Optimization – Importance of Dual Problem in SVM Fitting

svm

Given the data points $x_1, \ldots, x_n \in \mathbb{R}^d$ and labels $y_1, \ldots, y_n \in \left \{-1, 1 \right\}$, the hard margin SVM primal problem is

$$\text{minimize}_{w, w_0} \quad \frac{1}{2} w^T w$$
$$\text{s.t.} \quad \forall i: y_i (w^T x_i + w_0) \ge 1$$

which is a quadratic program with $d+1$ variables to be optimized for and $i$ constraints. The dual

$$\text{maximize}_{\alpha} \quad \sum_{i=1}^{n}{\alpha_i} – \frac{1}{2}\sum_{i=1}^{n}{\sum_{j=1}^{n}{y_i y_j \alpha_i \alpha_j x_i^T x_j}}$$
$$\text{s.t.} \quad \forall i: \alpha_i \ge 0 \land \sum_{i=1}^{n}{y_i \alpha_i} = 0$$
is a quadratic program with $n + 1$ variables to be optimized for and $n$ inequality and $n$ equality constraints.

When implementing a hard margin SVM, why would I solve the dual problem instead of the primal problem? The primal problem looks more 'intuitive' to me, and I don't need to concern myself with the duality gap, the Kuhn-Tucker condition etc.

It would make sense to me to solve the dual problem if $d \gg n$, but I suspect there are better reasons. Is this the case?

Solving the primal problem, we obtain the optimal $$w$$, but know nothing about the $$\alpha_i$$. In order to classify a query point $$x$$ we need to explicitly compute the scalar product $$w^Tx$$, which may be expensive if $$d$$ is large.
Solving the dual problem, we obtain the $$\alpha_i$$ (where $$\alpha_i = 0$$ for all but a few points - the support vectors). In order to classify a query point $$x$$, we calculate
$$w^Tx + w_0 = \left(\sum_{i=1}^{n}{\alpha_i y_i x_i} \right)^T x + w_0 = \sum_{i=1}^{n}{\alpha_i y_i \langle x_i, x \rangle} + w_0$$