Solved – In soft-margin SVM, is it guaranteed that some points will lie on the margin

MATLABsvm

In soft-margin SVM, once we solve the dual-form optimization problem (using quadratic programming, which is guaranteed to converge on a global optimum because the problem is convex), we get a vector $\alpha$ of coefficients that we can use to recover the hyperplane parameters $\beta$ and $\beta_0$. The additional constraint in soft-margin (compared to hard-margin) is $0 \leq \alpha_i \leq C$ for some positive constant $C$. We say that:

  1. If $\alpha_i = 0$, then the corresponding $x_i$ is outside the margin
  2. If $\alpha_i = C$, then the corresponding $x_i$ is on the wrong side of the margin
  3. If $0 \lt \alpha_i \lt C$, then the corresponding $x_i$ is on the margin

We recover $\beta$ through the KKT condition $\beta = \sum_{i}{\alpha_i y_i x_i}$, but we recover $\beta_0$ by selecting any $x_i$ whose corresponding $\alpha_i$ satisfies the third condition above and solving $y_i(\beta^T x_i + \beta_0)=1$ (we should get the same value for $\beta_0$ using any such point satisfying the third condition above).

Here's my question: given an optimal $\alpha$, is it possible for the third condition to not be satisfied for any $\alpha_i$? For example, I actually have a dataset that's linearly separable for which I'm trying to train a soft-margin SVM with $C=0.5$, but I'm finding that every value in $\alpha$ is either $0$ or $0.5$ and that there are no values in between, hence I cannot recover $\beta_0$. Am I doing something wrong if my $\alpha$ doesn't satisfy the third condition above for any $i$? My intuition tells me that there should always be at least two points on the margin because the primal optimization problem maximizes the size of the margin; but the $\alpha$ for my dataset is contradicting my intuition.


If it helps, I'm doing this in MATLAB with the following (pseudo)code:

X = 2 by n matrix of linearly separable data points
y = n by 1 matrix of class labels (-1 or +1)
X_hat = X with each column signed to its label
H = X_hat' * X_hat;
f = -1 * ones(n, 1);
Aeq = y';
beq = 0;
lb = zeros(n, 1);
ub = 0.5 * ones(n, 1);
alpha = quadprog(H, f, [], [], Aeq, beq, lb, ub); % alpha is always 0 or 0.5

Best Answer

No, it's not guaranteed, but if the solution has no alpha in that open interval, then we have a "degenerate" SVM training problem, for which the optimal w=0, and we always predict the majority class. This is shown in Rifkin et al.’s “A Note on Support Vector Machine Degeneracy”, an MIT AI Lab Technical Report. Does your problem have these characteristics? It seems unlikely this would happen with naturally occurring data.

Related Question