Solved – How does gamma in SVM RBF kernel influence the accuracy

cross-validationkernel trickmachine learningsvm

I am working on a classification program using SVM RBF kernel. To find the best parameters C and gamma, I used grid search, and got the image below. What confuses me is that when gamma varies from 0.3 to 3, the accuracy changes so rapidly. I wonder what happens in this region.

enter image description here

I think the good models should be found on the diagonal, of a higher C with a lower gamma , or a lower C with a higher gamma.

Could anyone help me to explain the performance variation when gamma is between 0.3 and 3?

I didn't do feature scaling with the first image so the features are not scaled. I guess it could be a reason because when I normalized the data, the image turned out to be like

enter image description here

the yellow part between gamma=1 and 10 is gone and the accuracy seems decreases slightly

Best Answer

For simplicity, first scale your data $X$ so that $median \|X_i - X_j\| \approx 1$: half the neighbors are < 1 away, and half > 1, on average.
What $e^{-gamma\ dist^2 }$ does is down-weight, attenuate, more distant neighbors. By how much ? Make a little table:

dist:                  [0    .5  1   2    3]
                       ---------------------
exp( - 0.3 * dist^2 ): [100  93  74  30   7] %
exp( -   1 * dist^2 ): [100  78  37   2   0] %
exp( -   3 * dist^2 ): [100  47   5   0   0] %

So $gamma = 3$ down-weights half the points by 5 % .. 0,
$gamma = 1$ by 37 % .. 0,
$gamma = 0.3$ even less. (The range 0.3 .. 3 is way too big.)

A simple rule of thumb: start with $gamma = 3$, for distances scaled to median 1.

Could you try $gamma = 2, 3, 4$ for your scaled data ?
Also, plotting the sample distributions of $dist = \|X_i - X_j\|$ and $e^{ -gamma\ dist^2 }$ might be useful.