Solved – Time Series Modeling with Lagged Variables

arimamachine learningrsoftwaretime series

I have a dataset with columns that represent lagged values of predictors. To illustrate with a simple example, suppose we had car sales data for 3 years and the only predictors available were income and population for a number of car dealers, the dataset could be represented as follows,

ID  IncLag1  PopLag1  SalesLag1  IncLag2  PopLag2 SalesLag2  IncCurrent  PopCurr  SalesCurr
a       100      1000     200        150      2000    300        500       2500         450
b       10        300      50         60       900     80         90       1000         100

k       30        60      10        200      2000     60         80          800         ??

My dependent variable is SalesCurr – i.e., given a history of past sales and corresponding Income and Population values (which we can use as the train-test data), predict what the Sales will be in the current year (SalesCurr).

My question is as follows — Using R or GRETL, how is it possible to create an ARIMA/TimeSeries model with the above data to predict the SalesCurrent variable. Using simple Linear Regression, one could simply have a formula such as say, lm (SalesCurrent ~ ., data=mytable), but it would not be a time-series model since it does not take into account the relationship between the different variables.

Alternatively, I am quite familiar with Machine Learning models and wanted to get your thoughts on how such a dataset could be modeled using say, randomForest, GBM, etc.

Thanks in advance.

Best Answer

You should investigate ARMAX/Dynamic Regression/Transfer Functions making sure that you deal with any Outliers/Level Shifts/Seasonal Pulses/Local Time Trends while testing for constancy of parameters and constancy of error variance. If you wish to post your data, do so and I will send you some results illustrating these ideas.

Related Question