I'm using SVMs with linear kernel for sentence classification (binary). My dataset contains many duplicate instances i.e. many sentences in the training set have identical feature vectors. In the trained model, I see a lot of duplicate support vectors, each with a different Lagrange Multiplier (alpha*y) term. For example:
-0.064019597363882741 5660:1 # fcda5cf6-e00b-11e2-ddad-001517e11bd0
-0.00028270965981203044 5660:1 # 67d4d4d2-855c-11e3-f88e-001517e11900
-0.00084391434273911603 5660:1 # 38d53d84-e435-11e2-ddad-001517e11bd0
-0.031641060077848393 5660:1 # 069c2c6e-e8be-11e2-edb1-001517e116e4
-0.0028105012661461251 5660:1 # 623c52e4-e038-11e2-edb1-001517e116e4
-0.00098570725949942428 5660:1 # f67aae78-e020-11e2-5884-001517e11900
For those who are unfamiliar with SVMLight model format, each of the above lines represents a support vector, where:
- 1st term is alpha*y
- key-value pairs ('5660:1') are feature_number:feature_value. So, all the above training instances contain just one feature with value 1.
- Stuff following '#' are comments (instance ids in my case)
Why do the above support vectors have different alpha*y values, even when they have identical feature vectors? Also, why does the model contain multiple SVs with same feature vectors in the first place? I had expected all SVs to have unique feature vectors, although the ones which were duplicated would've had larger alpha value. I agree that training instance duplication might shift the margin if duplicate instances fall within the (soft) margin, but I'm unable to understand why would each duplicate support vector have a different alpha*y value?
Best Answer
Just to expand on Justin's answer, In this case, the Lagrange multipliers are not unique - there are an infinite number of combinations of Lagrange multipliers which are optimal, and produce the exact same performance of the SVM. Because the constraints are linear, the Kuhn-Tucker-Kasrush conditions apply, whether or not any of the constraints are redundant (i.e., not full rank). If you used a different optimization solver, you could get other, still optimal, values of the Lagrange multipliers.