Solved – Time series with multiple subjects and multiple variables in R

forecastingmachine learningpanel datartime series

I'm having trouble finding a time series technique to deal with a data set I am working on. It contains multiple subjects and multiple variables, not all of which will likely be part of the time series. It looks something like this:

Subject  Date      T1  T2  V1  V2  V3
A        1/1/2012  1   5   9   13  17
A        2/1/2012  2   6   10  14  18
...
B        1/1/2012  3   7   11  15  19
B        2/1/2012  4   8   12  16  20
...

Where T1, T2 are likely time series, and V1, V2, and V3 are likely not. I'm sure that this distinction is probably unnecessary, since techniques like Box-Jenkins should detect autoregression in any variable.

Ultimately, I want to be able to do forecasting on other subjects that were probably not used to build this model.

If you know of any R package(s) that can take this on, please let me know. Some example code would also be greatly appreciated. Thank you for any insight you can provide.

Edit: I am looking into dynamic linear regression using the dynlm package, but am having trouble coding it to include the dates and subjects.

Best Answer

The Arima function in the forecast package can fit a regression model to the data with an ARIMA model for the errors. The order argument specifies the orders of the ARIMA model, while the argument xreg defines which data object contains the observations of the predictors. E.g., if xreg is a matrix of predictors:

model = Arima(series, order = c(1,1,0), xreg = covariates)

To find the order of the ARIMA process, you can simply use the auto.arima function also found in the forecast package. It automatically locates the best-fitting ARIMA model to the data, “fit” defined by one of three possible information criteria in the ic argument: the AIC (given by aic), the AICc (aicc), or the BIC (bic). E.g.,

model = auto.arima(series, ic = “aic”)

I think you may find this page really helpful, especially the section about R.