Solved – Gamma as inverse of the variance of RBF kernel

machine learningpythonrbf kernelscikit learnsvm

I would like to fix the parameter gamma by using the following heuristic and then select C using GridSearch:

taking the inverse of variance of RBF.

gamma= 1/(2*sigma^(2))

I wrote the following funcion :

def get_gamma(data):
    variance=np.var(data)
    variance=variance*2
    gamma=np.divide(1,variance)
    return gamma

my data is of shape (200,1436). 200 examples

Am I calculating gamma in the right way?

Correct me if l'm wrong :

for each of 1436 dimensions, we compute the mean of each dimension, then we subtract each data point of that dimension from its mean (here we have 200 examples) divided by 200. Then we get an array of 1436 values that represents variances within each dimension. Then we sum these variances and divide by 1436 and we get a scalar. Finally we multiply this scalar by 2 and compute its inverse, hence we get :
$$\gamma= \frac{1}{2 \sigma^2}
$$
Am I correct?

Best Answer

Setting $\gamma = \frac{1}{2 \sigma^2}$ is merely a simplification of the RBF kernel function that gets rid of the fraction in the expression.

Generally $\sigma$ is a free parameter of the kernel function. Of course, in some cases your approach might result in a reasonable choice for gamma, but that might not always be the case.

Consider the following scenario:

Here we have a bivariate data set with the features $A$ and $B$.

With your computation of $\gamma$ as

$$\gamma = \frac{1}{2\cdot \frac{\sum_{i=1}^{D} Var(X_i)}{D}}$$ and $$Var(A)=1.62$$ $$Var(B)=1.67$$ we get $$\gamma = \frac{1}{2\cdot \frac{1.62+1.67}{2}}=\frac{1}{3.29}=0.304$$

Using the default C value (1 in Matlab? I don't know right now) we end up with a decent accuracy score. The x-markers indicate wrongly classified instances.

However, here the variances of the features were somewhat similar. If there are features with significantly higher or lower variance, this might become problematic.

Extending this example...

let us add two additional features, $C$ and $D$. Again we compute $\gamma$

$$ Var(C) = 50.05$$ $$Var(D) = 4.14$$ $$\gamma = \frac{1}{2\cdot \frac{1.62+1.67+50.05 + 4.14}{4}}=\frac{1}{28.74}=0.034$$

resulting in pretty terrible accuracy, as expected (~50%).

Standardization is required for this to work

Since the features in our data can vary in their scale and variance, your described approach can leave us with bad results. To account for this variation, we can standardize our features to have $\mu=0$ and $\sigma=1$ by computing

$X_{new} = \frac{X - \mu}{\sigma}$

This way we end up with $\gamma=0.5$, since all features have standard deviation 1. This results in a very good accuracy on our multivariate example.

Summary

Yes it is possible to fix $\gamma$ as you described, but this can cause complications if the scales and variances of the features vary strongly. Standardizing the features can mitigate this, making the described approach a valid selection for $\gamma$.

Related Solutions

Solved – Gibbs sampling for a simple linear model — need help with the likelihood function

This is a statistics question, not a programming question, and would better be asked on CrossValidated. At least, the LaTeX code is getting parsed there automatically :). Also, this is more complicated than what is readily available on that webpage. I'll give some guidance, but as long as you want to learn how to do things, this won't be the complete answer. (If you don't want to do that, we can locate the cooked answers on the web, too.)

Each sampling of betas relies on the complete data set. If you are doing this with individual y_i's and x_i's, you are not doing this right. Before you start working with the code, you need to sit down with a piece of paper (letter size or A4, depending on your geography) and derive the posterior distributions of betas:

This is given: y|beta is normal with mean x'beta and precision tau
This is given: prior for beta is normal with mean mu and precision gamma
Obtain this: the marginal distribution of y, by integrating the betas out (which is easy to do, since the joint distribution of y and beta is multivariate normal, and you can do this by kernel matching: the part that depends on beta is going to be exp[ a quadratic form in beta], so you recognize this to be a relevant part of a normal distribution distribution to integrate over; whatever's left after integration should be a normal density in y and the prior parameters)
Obtain this: the posterior distribution of beta given y, by Bayes theorem (the likelihood times the prior divided by the posterior; again this should be a moderately complicated combination of exp[ jointly quadratic in y and beta ])
Obtain this: the conditional distribution of beta_1 given beta_2 and y, one of the margins of the multivariate normal distribution obtained at the previous step.

You need to know how to manipulate the multivariate normal distribution and get conditional and marginal distributions out of it. Again, if this is over your head, we can find the ready solutions.

Note that you also need a sampler for the variance of regression errors, unless you treat it as known (which is hardly a practical situation). This will be slightly more complicated, as you would need to incorporate another dimension into your integration procedures.

Solved – How to apply a Gaussian radial basis function kernel PCA to nonlinear data

The first problem seems to be that the sign of gamma is wrong (it should be negative: $-15$, as in the definition of the kernel, not as in your code). Alternatively, use exp(-gamma * mat_sq_dists).

The second problem is that you clobber the eigenvectors with your invocation of zip's when you sort the list. The $i$-th eigenvector is eigvecs[:,i], not eigvecs[i,:], according to scipy.linalg.eigh (also: you should prefer eigh to eig because you have a symmetric real matrix).

Replace

< gamma = 15
> gamma = -15

and (to get ordered, real eigenvalues)

< eigvals, eigvecs = np.linalg.eig(K)
> eigvals, eigvecs = scipy.linalg.eigh(K)

and

< eigvals, eigvecs = zip(*sorted(zip(eigvals, eigvecs), reverse=True))
< X_pc1 = eigvecs[0]
> X_pc1 = eigvecs[:,99]

Finally, you can examine scikit-learn's own implementation here.