Solved – Multivariate ARIMA with regression

arimamultivariate regressionr

I have a dataset covering daily data for 3 years (3×365 rows) for multiple attributes TotalPhoneCall (main attribute that I want to predict), Christmas day, weekend, weekday, Easter, 4th_july, etc.(some are seasonal).

I would like to predict TotalPhoneCall for the following month. I have to use ARIMA with regression. I may filter out unnecessary attributes if needed. How can I do this in R?

Best Answer

Your data set / design matrix tells a lot about your assumptions. You are explicitly assuming that week days have a common effect and weekends have a common effect. It is much more general to estimate the individual effects for each day and upon finding a common pattern to possibly reduce the design matrix accordingly.

You are implicitly assuming that

  • there are no lead or lag effects around the known holidays that you are considering,
  • there are no Pulses, Level Shifts and/or Local Time Trends,
  • the day-of-the-week effect is constant over the 1126 observations,
  • there is no particular week-of-the-year effect by only introducing monthly indicators,
  • the parameters of the model don’t change over time,
  • the error variance is constant over time
  • there is no need for ARIMA structure to render the final model’s residuals uncorrelated.

Other than these items, you are good to go!

In a positive note I would strongly suggest that you find a consultant or a program/approach/solution that speaks to some/all of the above considerations. I would start with http://www.unc.edu/~jbhill/tsay.pdf .

after receipt of your data ..

The term Multivariate Arima is synonymous to VECTOR ARIMA i.e. multiple endogenous series. your problem has 1 endogenous (output) series thus and multiple inputs. This is called a Transfer Function. I note (but ignored) that you had negative phone calls. The data

enter image description here

One can suggest Holiday impacts and identify lead and lag structure around these events. In addition, there may be unusual activity that needs to be isolated/accounted for in order to get robust estimates of the daily effects/the monthly effects, the day-of-the-month effects, the week of the month effects. Following is the Actual-Fit-Forecast graph

enter image description here

and the residual plot

enter image description here

A plot of the forecasts for the next 21 days is presented here.

enter image description here

The actual equation is presented here in two parts

enter image description here

and

enter image description here

In summary your model has the following characteristics. enter image description here. In summary the third week of the month is statistically significant along with two level shifts at periods 5/15/09 and 5/20/10. Additionally, your series is impacted positively on Thursday and Monday and negatively on Saturday. There is a strong impact due to some specific months of the year and a few holidays.