Solved – Call volume: time series regression model from 52 weeks a year and lagged predictors

arimaforecastingrregressiontime series

I have been adamantly searching the web to learn how to successfully implement a dynamic regression time series in the forecast package for R. The time series data that I am using is weekly data (frequency=52) of incoming call volume and prediction variables are mailers sent out every now and then. They are a significant predictor of the data for the week that they hit, the following week, and the week after that. I have created lagged variables and use these three as the predictors.

My main concern is that the arima model is not taking into account the time series frequency. When I tell it to recognize the ts with a frequency of 52 it has an error.
I have looked at the fortrain function but do not understand it. I also have looked at the tbats suggested but found that those will not work with prediction variables.

The Zoo function recognizes 52 frequency but it is not advised to use with the forecast package.

Here is the basic code. The problem is that the time series calwater[,5] is not recognized as such. It is imputed as a simple vector as an integer…

#this works without taking into acount the ts
fit2 <- auto.arima(calwater[6:96,5], xreg=calwater[6:96,6:8], d=0)
fccal <- forecast(fit2, xreg=calwater[97:106,6:8], h=10)
fccal
plot(fccal, main="Forecast Cal Water", ylab="Calls")

#to form a ts object
calincall<-ts(calwater[1:106,5],start=c(2011,23),frequency=52)

#once the ts is added to the model this dispalys
#Error in `[.default`(calincall, 2:100, 1) : incorrect number of dimensions

Maybe the error is because there is just a little over two years of data.

#Time Series: Start = c(2011, 23), End = c(2013, 24),Frequency = 52 

I would be very grateful for any guidance in for this particular issue. I am using the forecast package and prefer to continue within the package but I am open to suggestions.

Best Answer

With only two years of data, and a frequency of 52, it is difficult to estimate an ARIMA model. If it involves seasonal differencing, you are losing half your data, for example. It will probably be better to handle the seasonality with Fourier terms (see http://robjhyndman.com/hyndsight/longseasonality/). Then specify seasonal=FALSE in the call to auto.arima. You can choose the number of Fourier terms via the AIC -- just increase the order until the AIC no longer decreases.

Related Question