Time Series Analysis – Combining Linear Regression and Time Series for Enhanced Predictions

multiple regressiontime series

I’m trying to figure out if I can combine linear regression and a time series model to help make predictions about the number of shots in a soccer game.

When I perform the linear regression, I have some highly significant independent variables (such as home/on the road, possession) and then I’m left with some residuals that appear to show significant auto-correlation with one another (particularly when I test for PACF).

What I can’t get my head around is how, and if, I can combine these two techniques to assist in my prediction.

Previously I was thinking I would figure out what lags/ARIMA model I should be using (it’s looking like a (2,0,0)) and then apply the AR2 to the residuals (or even the whole of the dependent variable) to produce a new independent variable that I then use in the linear regression. But this doesn’t seem mathematically sound.

So, instead what should I do? If I know, for example, that the next player’s game is at home, his team is predicted to get 60% possession and the residuals from a regression (of the aforementioned significant variables) show a significant AR2 correlation, how should I appropriately leverage this information to produce an optimal prediction of his shots?

Best Answer

Previously I was thinking I would figure out what lags/ARIMA model I should be using (it’s looking like a (2,0,0)) and then apply the AR2 to the residuals (or even the whole of the dependent variable) to produce a new independent variable that I then use in the linear regression. But this doesn’t seem mathematically sound.

Instead of doing it in two steps, you can do it simultaneously, making it more "mathematically sound". That will be a regression with ARMA errors. Here is some discussion of that and related techniques. R implementation is also discussed.

In your case, denote the dependent variable $y$ and the independent variables $x_1, \dotsb, x_k$. Having loaded library "forecast", use auto.arima(y,xreg=cbind(x_1,...,x_k)) to automatically select a sensible order for the ARMA structure in the model errors.

Related Question