Solved – Strange kernlab’s relevance vector machine predictions

I am using a relevance vector machine as implemented in the kernlab-package in R, trained on a dataset with 360 continuous variables (features) and 60 examples (also continuous, so it's a relevance vector regression).

I have several datasets with equivalent dimensions from different subjects. Now it works fine for most of the subjects, but with one particular dataset, I get this strange results:

When using leave-one-out cross validation (so I train the RVM and try to subsequently predict one observation that was left out of the training), most of the predicted values are just around the mean of the example-values.
So I really don't get good predictions, but just a slightly different value than the mean.

It seems like the SVM is not working at all;
When I plot the fitted values against the actual values, I see the same pattern; predictions around the mean. So the RVM is not even able to predict the values it was trained on (for the other datasets I get correlations of around .9 between fitted and actual values).

It seems like, that I can at least improve the fitting (so that the RVM is at least able to predict the values it was trained on) by transforming the dependent variable (the example-values), for example by taking the square root of the dependent variable.

so this is the output for the untransformed dependent variable:

Relevance Vector Machine object of class "rvm"
Problem type: regression

Linear (vanilla) kernel function. 

Number of Relevance Vectors : 5 
Variance :  1407.006
Training error : 1383.534902093

this, if I first transform the dependent variable by taking the square root:

Relevance Vector Machine object of class "rvm"
Problem type: regression

Linear (vanilla) kernel function. 

Number of Relevance Vectors : 55 
Variance :  1.711355
Training error : 0.89601609

How is it, that the RVM-results change so dramatically, just by transforming the dependent variable? And what is going wrong, when an SVM just predicts values around the mean of the dependent variable (even for the values and observations it was trained on)?

Best Answer

First, there are RVM models for classification (see section 3 of Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. The Journal of Machine Learning Research, 1, 211–244.). There are not as many implementations of it compared to the regression model (in R at least).

So are your distributed by the error rates, the variance or the number of rv's? This seems pretty straight-forward to me since the outcomes are on a different scale. I'll assume that it is the error estimate.

Suppose your outcome ranged from 1 to 144. The RMSE for a model is likely to be very different from one with an outcome ranging from 1 to 12. More unitless metrics, such as R^2, are on the same scale but the error rate is not.

Plus, don't use LOO. The bootstrap or repeated k-fold CV will do better at estimating the error rate (variance properties are much better and the bias in the bootstrap can be corrected).

Best Answer

Related Solutions

Solved – Coefficients in Support Vector Machine

Related Question