Solved – SVM parameter selection and cross validation

cross-validationmachine learningpredictive-modelssvm

Have a quick question about parameter selection for an SVM. I'm using a rbf kernel, so trying to optimize C and gamma. I have an example set of around 4500, about 700 features, and using 700 examples from the set for testing. My dataset does consist of time series. I've been using a 5 fold cross validation with a grid search to find the optimal parameters for the test set and have continued to noticed fairly large differences between the accuracy of my training set vs. the accuracy of my test set. Note, however, when I say accuracy, I have imposed a cost matrix when evaluating the fit of the model such that certain classes have much higher costs when misclassified (note i also ran the svm with unequal class weights). Because my data is a time series, I'm wondering if I should use a different approach from cross validation e.g. a moving window evaluation or something similar. Is cross validation the best approach? Are there other ways to search for the optimal parameters? And also, are there ways to speed up the parameter search (I've heard of using minimum finding algorithms as an alternative to a grid search, which I'm considering implementing)?

Any thoughts would be most welcome. Thanks.

Best Answer

There is a recently proposed method to speed up grid search: "Fast Cross validation via sequential analysis"

http://www.scribd.com/doc/76134034/Fast-Cross-Validation-Via-Sequential-Analysis-Talk

Basically, they're doing a normal grid search, but try to eliminate bad parameters early in the process and not waste too much computation on them. It's fairly new and I don't know independent evaluations of their method, but I'm currently implementing it and want to give it a try.