Solved – Multivariate Time Series Classification/Regression

classificationmachine learningneural networksrtime series

Background, I'm predicting stock price change direction (either up or down) with about 200 predictors. All of them are time series data. We have about 1500 days as training/validation data.

My question is what ML algorithm can I use in a time series classification problem. i.e.Use the 200 predictors on time t to predict the direction at time t+1.

BTW, I use R only so please do not give me Python packages. Thanks very much!

Best Answer

My question is what ML algorithm can I use in a time series classification problem. i.e.Use the 200 predictors on time t to predict the direction at time t+1.

The issue here is that the concept of input and output features don't necessarily exist for time series analysis, at least not in the conventional sense.

For instance, let's say you have features at time t that you are using to predict stock price at time t+1, e.g. price of S&P 500, earnings per share, etc.

To be able to calculate the stock price at time t+1, this necessitates knowing what the values of your explanatory variables are at t + 1 as well.

If you are looking to incorporate the features in forecasting the stock price, then one potential way to do that is by using a standard OLS regression corrected for serial correlation, i.e. correlation between residuals at time t and t+1. By correcting for serial correlation with a suitable remedy (e.g. Cochrane-Orcutt), then this may result in improvements to estimates of stock price with OLS.

However, let us suppose that you wish to use an ML-based model. The first consideration is that the model must be set up to take into account the sequential nature of time series data. For instance, a long-short term memory network (LSTM) is a specialized neural network that is designed for this purpose.

This is an example of electricity consumption prediction with an LSTM, where highly volatile data was modelled using the LSTM with reasonably high standards of accuracy.

over 50 days

In that specific example, x was equal to t-50, with y = t. So, no external predictors were used, rather the time series itself was used as the input and output across different time periods.

Here is a separate example of how LSTM can be run through TensorFlow using R.

A good idea may be to run both instances and compare models. You may find that predicting the time series in its own right without external predictors yields more accurate results.

Related Question