Solved – Why is RBF kernel used in SVM

kernel tricksvm

I learned that due to infinite series expansion of exponential function Radial Basis Kernel projects input feature space to infinite feature space.
Is it due to this fact that we use this kernel often in SVM.? Does projecting in infinite dimensional space always makes the data linearly separable.?

Best Answer

RUser4512 gave the correct answer: RBF kernel works well in practice and it is relatively easy to tune. It's the SVM equivalent to "no one's ever been fired for estimating an OLS regression:" it's accepted as a reasonable default method. Clearly OLS isn't perfect in every (or even many) scenarios, but it's a well-studied method, and widely understood. Likewise, the RBF kernel is well-studied and widely understood, and many SVM packages include it as a default method.

But the RBF kernel has a number of other properties. In these types of questions, when someone is asking about "why do we do things this way", I think it's important to also draw contrasts to other methods to develop context.

It is a stationary kernel, which means that it is invariant to translation. Suppose you are computing $K(x,y).$ A stationary kernel will yield the same value $K(x,y)$ for $K(x+c,y+c)$, where $c$ may be vector-valued of dimension to match the inputs. For the RBF, this is accomplished by working on the difference of the two vectors. For contrast, note that the linear kernel does not have the stationarity property.

The single-parameter version of the RBF kernel has the property that it is isotropic, i.e. the scaling by $\gamma$ occurs the same amount in all directions. This can be easily generalized, though, by slightly tweaking the RBF kernel to $K(x,y)=\exp\left(-(x-y)'\Gamma(x-y)\right)$ where $\Gamma$ is a p.s.d. matrix.

Another property of the RBF kernel is that it is infinitely smooth. This is aesthetically pleasing, and somewhat satisfying visually, but perhaps it is not the most important property. Compare the RBF kernel to the Matern kernel and you'll see that there some kernels are quite a bit more jagged!

The moral of the story is that kernel-based methods are very rich, and with a little bit of work, it's very practical to develop a kernel suited to your particular needs. But if one is using an RBF kernel as a default, you'll have a reasonable benchmark for comparison.

Best Answer

Related Solutions

Solved – The difference of kernels in SVM

Solved – Does using a kernel function make the data linearly separable? If so, why using soft-margin SVM

Related Question