Solved – Multilinear regression vs. Time Series

forecastingpredictive-modelsregressiontime series

I have sale data for 3 years by week.I need to predict sales for the next year by week.

The business requested that some categorical values and numeric values (so for example category, product quantity offered and amount of weekly traffic) are included as coefficients that they can change for next year and accordingly the projection would change. This would be very easy with a multilinear regression model. (I was planning on training the model on 2 years and testing it on the 3rd)

Since my data is definitely seasonal, I thought that I need to use a time series model. (I was thinking ARIMA, but any other recommendation is welcomed). However, in the past when I used ARIMA, I just had a timestamp and predicted the value at a future timestamp, without other coefficients that could be changed by the user.

It might be my misunderstanding of time series models, but is there a combination of a model that can have adjustable coefficients like a multilinear regression model yet does account for seasonality?

Or would extrapoliation be the way to go?

I looked at those two questions, but they don't help me directly.

Linear regression vs Time series analysis

Time Series Forecasting vs Linear Regression Extrapolation

Best Answer

The most popular methods for time series forecasting are ARIMA and Holt-Winters. Holt-Winters is simpler and computationally less expensive. It is especially designed for seasonality. If you use ARIMA you must use seasonal ARIMA. Some studies show that, in real life data, Holt-Winters and seasonal ARIMA not only have a similar accuracy but predict similar values (http://webarchive.nationalarchives.gov.uk/20160105160709/http://www.ons.gov.uk/ons/guide-method/ukcemga/ukcemga-publications/publications/archive/from-holt-winters-to-arima-modelling--measuring-the-impact-on-forecasting-errors-for-components-of-quarterly-estimates-of-public-service-output.pdf).

Your problem is multi-variate time series, not just $y(t)$ but $y(X,t)$.

An idea is to try something like a linear regression with time varying coefficients. Think of a linear model without time : $y(X)=\beta X$ where $\beta$ is a vector. Then introduce time : $y(X,t)=\beta(t) X$. Then see $\beta(t)$ as a vector time series.

But you can't observe $\beta(t)$.

I have not read any article addressing this problem even though it is somehow a problem of growing importance : studying time and other parameters together.

If at each time you have a lot of (y,X) samples covering well the possible values of X, then you could possibly estimate $\beta(t)$ at every time and then apply a time series method to this vector. But I guess you data is not like this.

Added:

The ultimate solution where you can really work on $\beta(t)$ as a hidden parameter is the Kalman filter. If $\beta(t)$ is just a random walk, then the state of Kalman filter is just $\beta(t)$ and this would be not so difficult to implement, using observations to update the filter. Next step would be to use an ARIMA or HW model inside the filter but this starts being hardcore maths. I don't know of anything like this packaged as an easy-to-use tool. You can read more about it: What are disadvantages of state-space models and Kalman Filter for time-series modelling?

Related Question