I'm using clf = svm.SVC(kernel='linear') on a data set with only two classes $y \in \{-1, +1\}$ and the feature values of all samples are normalized between 0 and 1. We know from various studies (Guyon et al. 2003, Chang et al. 2008) that the magnitude of a feature weight in clf.coef_ is indicative of the importance of the feature for the classification.

My question is how the sign (positive or negative) of the feature weight can be interpreted, when all the features values are positive?

Given that all my feature values are normalized between 0 and 1, I think that a negative feature weight contributes to a negative classification ($y=-1$), while a positive feature weight contributes to a positive classification ($y=+1$). clf.coef_ is the coefficients in the primal problem. Looking at the formulation of a hard-margin primal optimization problem, using a linear kernel:

$$
\text{Minimize } \frac{1}{2} \mathbf{w}^T \mathbf{w} \\
\text{Subject to } y_i (\mathbf{w}^T\mathbf{x}_i – b) \geq 1, \quad i = 1, \ldots, N
$$

where $y_i \in \{-1, +1\}$ is the class of sample $i$, $\mathbf{x}_i$ is the feature representation of sample $i$ and $\mathbf{w}$ is the feature weights, which I presume is what clf.coef_ returns. $b$ is just an offset of the max-margin hyperplane. From the constraint $y_i (\mathbf{w}\mathbf{x}_i – b) \geq 1$, we can thus see that

If $y_i = -1$
- Then the dot product $\mathbf{w}\mathbf{x}_i$ has to be negative. Since all feature values in $\mathbf{x}$ are positive, then a negative feature weight contributes to a negative classification.
If $y_i = +1$
- Then the dot product $\mathbf{w}\mathbf{x}_i$ has to be positive. Similarly, a positive feature weight then contributes to a positive classification.

A similar analysis can made for the soft-margin case.

Unfortunately, I haven't been able to find any academic sources for this interpretation of the sign of the feature weights. Perhaps the case of a binary classification with only positive features isn't general enough to deserve a study. My questions are thus:

Is my analysis and subsequent conclusion correct, that a negative feature weight contributes to a negative classification, and a positive feature weight contributes to a positive classification?
Does anyone know of any academic sources that I can cite to strengthen my claim?

A comment in this post seems to support my claim, but no motivation or references are provided.

Update

I implemented some code to test my hypothesis. Using 9 features such that

features 1, 2, 3 occur more frequently in the negative class
features 4, 5, 6 occur as frequently in both classes
features 5, 6, 7 occur more frequently in the positive class

Code:

from __future__ import division
import numpy as np
from sklearn import svm

N = 1000  # number of samples
n_features = 9
n_iter = 100

# features 1, 2, 3 occur more in the negative class
# features 4, 5, 6 occur as frequently in both classes
# features 7, 8, 9 occur more frequently in the positve class
mean_neg = [0.90, 0.80, 0.70, 0.20, 0.50, 0.70, 0.20, 0.10, 0.05]
mean_pos = [0.05, 0.10, 0.20, 0.20, 0.50, 0.70, 0.70, 0.80, 0.90]
cov = np.eye(n_features)*0.04  # std = 0.2 <=> var = std**2 = 0.04

weights_avg = np.zeros(n_features)
accuracy_avg = 0

for _ in range(n_iter):
    # generate data
    X_neg = np.random.multivariate_normal(mean_neg, cov, N//2)
    X_pos = np.random.multivariate_normal(mean_pos, cov, N//2)
    # only values between 0 and 1
    X_neg[X_neg < 0] = 0
    X_neg[X_neg > 1] = 1
    X_pos[X_pos < 0] = 0
    X_pos[X_pos > 1] = 1

    X = np.concatenate((X_neg, X_pos), axis=0)  # training data
    y = np.ones(N)  # labels
    y[:N//2] = -1  # first N/2 samples are from the negative class

    clf = svm.SVC(kernel='linear')
    clf.fit(X, y)
    # testing on training data, just to see if the SVM learns
    y_pred = clf.predict(X)

    accuracy_avg += np.sum(y_pred==y) / N
    weights_avg += clf.coef_[0]  # coef_ has shape [n_class-1, n_features]

accuracy_avg /= n_iter
weights_avg /= n_iter

print("Average accuracy on training data: {}".format(accuracy_avg))
print("Average linear SVM weights:")
print(weights_avg)

Running this gave the following output

Average accuracy: 0.99999
Average linear SVM weights:
[-1.3517 -1.0827 -0.7347  0.0058  0.0237 -0.0354  0.7371  1.0735  1.3423]

And we see that the sign of the feature weight indeed is indicative of which class a feature contributes to, and that the magnitude also tells us how informative the feature is for the classification!

Best Answer

My answer may be trivial - I thought the parameter sign indicated whether the point was above or below the hyperplane. Slide 4 of these lecture notes show this (I think). Here is the relevant slide:

Citation: Zisserman, A., "Lecture 2: The SVM classifier", C19 Machine Learning lectures Hilary 2015, Oxford University, Oxford, (accessed 15/8/2017)

Solved – Linear SVM feature weights interpretation. Binary classification, only positive feature values

Update

Best Answer

Related Question

Update

Best Answer

Related Solutions

Solved – How to do feature selection for learning from positive and unlabeled examples

Solved – Correlated features produce strange weights in Logistic Regression

Related Question