Solved – Sample weights scaling in sklearn.svm.SVC

machine learningpythonscikit learnsvm

It seems that for sklearn.svm.SVC, different scaling of sample_weight makes the classifier behaves differently. Is it correct? If yes, how does the sample_weight work?

sample_weights = np.ones((X.shape[0])) / X.shape[0]

from sklearn.svm import SVC

clf0 = SVC()
clf0.fit(X, y, sample_weights*10)
plt.subplot(121)
visualize_result(clf0, X, y)

clf1 = SVC()
clf1.fit(X, y, sample_weights*5)
plt.subplot(122)
visualize_result(clf1, X, y)

And the result is

enter image description here

Best Answer

In SVC optimization problem, C parameter changes to C[i], where i is index of sample. Each C[i] is C[i] = C * sample_weight[i]. AFAIK when you use sample_weights and class_weights simultaneously - C[i] = C * sample_weight[i] * class_weight[class[i]]

  1. Official documentation of scikit-learn unbalanced problems
  2. LibSVM manual Unbalanced Data and Solving the Two-variable Sub-problem page 26

Thus when you providing less sample_weights - you classifier becomes more regularized.