Solved – VC Dimension of Gaussian Kernel

machine learningsvm

The VC dimension of SVM using the Gaussian kernel $K(x,x') = e^{-\gamma \|x-x'\|_2^2}$ is infinite, but I have never seen a proof or a detailed explanation of this fact.

How do we explicitly choose points $x_1,\cdots, x_n$ in, say, $\mathbb R^2$ that can be scattered by SVM hypotheses using the Gaussian Kernel?

Intuitively, just from looking at pictures, the SVM with Gaussian kernel lets us place any number of bounded patches on the plane. Then we can choose points sufficiently far apart (depending on the parameter $\gamma$, which also determines the size of the patches) and place a patch around precisely those points which we wish to classify positively. How do I make that rigorous?

Best Answer

The decision function of an SVM is $$ f(x) = \mbox{sgn} \left ( \sum_i y_i\alpha_i K(x,x_i) +b \right ) $$

For a data point in your training set, $x_j$, this will look as follows: $$ f(x_j) = \mbox{sgn} \left ( \sum_{i,i\neq j} y_i\alpha_i K(x_j,x_i) + y_j\alpha_j K(x_j,x_j) +b \right ) $$

Now, observe that

  • $\lim_{\gamma \rightarrow \infty} K(x_j,x_j)=1$
  • $\lim_{\gamma \rightarrow \infty} K(x_i,x_j)=0$ for all $x_i\neq x_j$

Therefore, for sufficiently large $\gamma$, the decision for a point in the data set will be made based on its own label only, as the summation over the rest of the points will become negligible.

The data layout does not really matter. It will just require larger values of $\gamma$ if some points are very close to each other.

Related Question