Solved – Does ARIMA assume evenly-spaced data in statsmodels

arimastatsmodelstime seriesunevenly-spaced-time-series

I've read that ARIMA assumes that endogenous input array is evenly spaced. If that is the case, then what is the point of the dates parameter in statsmodels.tsa.ARIMA(), which seems like it is there to support irregularly spaced data? Also, what are the assumptions for the optional exogenous arrays, need these be spaced in the exact same way as the endogenous?

Best Answer

Models written in terms of lagged variables (which ARIMA is) work only on equally-spaced time periods. If the data was irregular and worked in this function, you would have to specify a correlation function (such as exponential correlation, gaussian correlation, etc.), as you do in geostatistical models. The dates parameter is only for plotting purposes.

The exogenous arrays usually are spaced the same way as the endogenous, because they are basically explanatory variables in a regression that explains the endogenous variable, so it usually would not make sense for them to be collected at different times (there are exceptions, such as when you collect the exogenous variable at a different frequency and match them with the closest date for the endogenous variable (i.e. you have yearly GDP measurements you are using to predict monthly local unemployment values), or where you use lagged exogenous variables (i.e. using precipitation from last year and the year before to predict tree growth this year, with the non-independent error terms modeled by the ARIMA model).

Related Question