Solved – Gamma as inverse of the variance of RBF kernel

machine learningpythonrbf kernelscikit learnsvm

I would like to fix the parameter gamma by using the following heuristic and then select C using GridSearch:

taking the inverse of variance of RBF.

gamma= 1/(2*sigma^(2))

I wrote the following funcion :

def get_gamma(data):
    variance=np.var(data)
    variance=variance*2
    gamma=np.divide(1,variance)
    return gamma

my data is of shape (200,1436). 200 examples

Am I calculating gamma in the right way?

Correct me if l'm wrong :

for each of 1436 dimensions, we compute the mean of each dimension, then we subtract each data point of that dimension from its mean (here we have 200 examples) divided by 200. Then we get an array of 1436 values that represents variances within each dimension. Then we sum these variances and divide by 1436 and we get a scalar. Finally we multiply this scalar by 2 and compute its inverse, hence we get :
$$\gamma= \frac{1}{2 \sigma^2}
$$
Am I correct?

Best Answer

Setting $\gamma = \frac{1}{2 \sigma^2}$ is merely a simplification of the RBF kernel function that gets rid of the fraction in the expression.

Generally $\sigma$ is a free parameter of the kernel function. Of course, in some cases your approach might result in a reasonable choice for gamma, but that might not always be the case.

Consider the following scenario:

Here we have a bivariate data set with the features $A$ and $B$.

With your computation of $\gamma$ as

$$\gamma = \frac{1}{2\cdot \frac{\sum_{i=1}^{D} Var(X_i)}{D}}$$ and $$Var(A)=1.62$$ $$Var(B)=1.67$$ we get $$\gamma = \frac{1}{2\cdot \frac{1.62+1.67}{2}}=\frac{1}{3.29}=0.304$$

enter image description here

Using the default C value (1 in Matlab? I don't know right now) we end up with a decent accuracy score. The x-markers indicate wrongly classified instances.

enter image description here

However, here the variances of the features were somewhat similar. If there are features with significantly higher or lower variance, this might become problematic.


Extending this example...

let us add two additional features, $C$ and $D$. Again we compute $\gamma$

$$ Var(C) = 50.05$$ $$Var(D) = 4.14$$ $$\gamma = \frac{1}{2\cdot \frac{1.62+1.67+50.05 + 4.14}{4}}=\frac{1}{28.74}=0.034$$

enter image description here

resulting in pretty terrible accuracy, as expected (~50%).

enter image description here

Standardization is required for this to work

Since the features in our data can vary in their scale and variance, your described approach can leave us with bad results. To account for this variation, we can standardize our features to have $\mu=0$ and $\sigma=1$ by computing

$X_{new} = \frac{X - \mu}{\sigma}$

This way we end up with $\gamma=0.5$, since all features have standard deviation 1. This results in a very good accuracy on our multivariate example.

enter image description here


Summary

Yes it is possible to fix $\gamma$ as you described, but this can cause complications if the scales and variances of the features vary strongly. Standardizing the features can mitigate this, making the described approach a valid selection for $\gamma$.

Related Question