Solved – Support vector regression for multivariate time series prediction

machine learningsvmtime series

Has anyone attempted time series prediction using support vector regression?

I understand support vector machines and partially understand support vector regression, but I don't understand how they can be used to model time series, especially multivariate time series.

I've tried to read a few papers, but they are too high level. Can anyone explain in lay terms how they would work, especially in relation to multivariate time series?

EDIT:To elaborate a bit, let me try to explain with a stock price example.

Say we have stock prices for N days. Then, for each day we could construct a feature vector, which, in a simple case, could be be the previous day's price and the current day's price. The response for each feature vector would be the next day's price. Thus, given yesterday's price and today's price the objective would be to predict the next days price. What I don't understand is, say we have six months training data, how would you give greater emphasis to the more recent feature vectors?

Best Answer

In the context of support vector regression, the fact that your data is a time series is mainly relevant from a methodological standpoint -- for example, you can't do a k-fold cross validation, and you need to take precautions when running backtests/simulations.

Basically, support vector regression is a discriminative regression technique much like any other discriminative regression technique. You give it a set of input vectors and associated responses, and it fits a model to try and predict the response given a new input vector. Kernel SVR, on the other hand, applies one of many transformations to your data set prior to the learning step. This allows it to pick up nonlinear trends in the data set, unlike e.g. linear regression. A good kernel to start with would probably be the Gaussian RBF -- it will have a hyperparameter you can tune, so try out a couple values. And then when you get a feeling for what's going on you can try out other kernels.

With a time series, an import step is determining what your "feature vector" ${\bf x}$ will be; each $x_i$ is called a "feature" and can be calculated from present or past data, and each $y_i$, the response, will be the future change over some time period of whatever you're trying to predict. Take a stock for example. You have prices over time. Maybe your features are a.) the 200MA-30MA spread and b.) 20-day volatility, so you calculate each ${\bf x_t}$ at each point in time, along with $y_t$, the (say) following week's return on that stock. Thus, your SVR learns how to predict the following week's return based on the present MA spread and 20-day vol. (This strategy won't work, so don't get too excited ;)).

If the papers you read were too difficult, you probably don't want to try to implement an SVM yourself, as it can be complicated. IIRC there is a "kernlab" package for R that has a Kernel SVM implementation with a number of kernels included, so that would provide a quick way to get up and running.