Solved – SVM regression with LIBSVM and python – Expected runtime

libsvmregressionsvm

Im new to regression, but quit experienced in classification and machine learning. In Classification, the state of the art technique for classification is SVM, so the first solution I would like to use for my regression problem is an SVM one.

The regression problem im trying to solve can be described as follows:
I have one dependent variable which i would like to estimate with 100,1000,4000 independent variables respectively, and estimate the accuracy with each size. I have around 100,000 observations which Im planning to use in a 90%-10% cross validation scheme.
The generating process is probably not linear, this is why I want to use SVM regression (I will test linear kernel, though)

Question is, before I even attempt such process, what is the expected run-time for one regression iteration with linear or RBF kernel? Can anybody give a raw estimation? Is this problem feasible in the regression world?

Best Answer

The complexity of SVM regression is similar to the complexity of SVM classification. If problems of that size are feasible for you in a classification context, they are also feasible in regression.

When using a nonlinear kernel, training complexity is quadratic in terms of the number of training instances. 100k training instances is quite a lot, so I most definitely recommend trying a linear kernel first. For the linear kernel, you should consider using LIBLINEAR instead of LIBSVM (same authors, the former is made specifically for large-scale problems).

The impact of the number of dimensions on training time is not very high, this is one of the advantages of kernel methods. Knowing that, you may well go for 4000 dimensions straight away. If you have 4000 dimensions, linear models are likely to perform quite well.

It is very hard to give a good estimate of the actual run time as it depends on a lot of things, related to the data and your hardware. That said, you can expect training time to be in the order of hours tens of minutes per model for LIBSVM. If you use LIBLINEAR, it will be a couple of seconds.