I'm using a Support Vector Machine from Scikit Learn.
Example data:
[[1, 1],
[2, 2],
[3, 3], ...]
My code:
clf = svm.SVC(kernel='linear')
clf.fit(x, y)
This gives me:
n = clf.coef_[0]
d = clf.intercept_[0]
which I thought to be $ n_0 $ and $ d $ of the hessian normal form see:
$$
{\displaystyle {\vec {x}}\cdot {\vec {n}}_{0}-d=0}
$$
But when I plot the seperating hyperplane of my SVM something is wrong. The line has the right slope but the wrong intercept.
It seems like the formula has to be:
$$
{\displaystyle {\vec {x}}\cdot {\vec {n}}_{0}+d=0}
$$
What am I missing here?
Best Answer
I believe this is just pure convention regarding what Scikit Learn means by the attribute "intercept". The implementation from Scikit Learn is based on libsvm, and I found this handy guide on the libsvm webpage:
https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
In equation 1 of that guide they write down the optimization problem for SVMs, which I'll translate to the linear case since that's what you're working with; I'll also use your $\vec{n}_0, d$ notation. Supposing you have $l$ instance-label pairs $(\vec{x}_i,y_i)_{i=1}^l$, where $\vec{x}_i \in \mathbb{R}^n$ and $y_i \in \{-1,1\}$, you want to solve
$$\begin{align} \min_{\vec{n}, d, \vec{\xi}} &\frac{1}{2} \vec{n}\cdot \vec{n} + C \sum_{i=1}^l \xi_i,\\ \text{subject to } & y_i(\vec{n}\cdot \vec{x}_i + d) \geq 1-\xi_i,\\ & \xi_i \geq 0 \end{align}$$
The first constraint is the relevant one for your classification. Allowing yourself slack $\xi_i$ for each example, you want the sign of $y_i$ to be the same as $\vec{n}\cdot \vec{x}_i + d$. So that means you want all the examples of class $1$ to satisfy $\vec{n}\cdot \vec{x}_i + d >0$, and all examples of class $-1$ to satisfy $\vec{n}\cdot \vec{x}_i + d <0$.
So, the decision plane is $\vec{n}\cdot \vec{x}_i + d = 0$, as you've guessed.