Solved – Irregular Time Series

arimaforecastingmissing datartime series

Please consider the following code (in R)

tt <- structure(c(1494.5, 1367.57, 1357.57, 1222.23, 1124.02, 1011.64, 
4575.64, 3201.87, 3050.04, 2173.38, 1967.88, 1838.55, 1666.05, 
1656.05, 1524.96, 835.96, 775.36, 592.36, 494.15, 4058.15, 2624.36, 
2448.47, 1598.47, 1398.47, 1264.14, 1165.88, 1053.67, 941.36, 
821.36, 471.36, 373.15, 259.91, 3808.91, 2262.26, 1940.39, 1011.39, 
800.81, 790.81), index = structure(c(16563L, 16565L, 16570L, 
16572L, 16577L, 16579L, 16584L, 16585L, 16586L, 16587L, 16588L, 
16589L, 16590L, 16592L, 16593L, 16599L, 16606L, 16607L, 16608L, 
16612L, 16613L, 16614L, 16617L, 16618L, 16619L, 16620L, 16621L, 
16628L, 16633L, 16635L, 16638L, 16642L, 16647L, 16648L, 16649L, 
16650L, 16651L, 16654L), class = "Date"), class = "zoo")

tt2 <- as.ts(tt)
tt2 <- na.locf(tt2) #I replace the NA with the previous non-NA value
mm  <- auto.arima(tt2)

plot(forecast(mm, h=60))

The results of the auto.arima function is puzzling …
There is a clear seasonality in the data (this is the balance of an account: every month a salary is cashed in and there is a spike in the value of the series, followed by a decrease until the next salary is received). I would like to forecast a couple of cycles, but the auto.arima forecast is nothing like I expect.

Does anybody have any suggestions (also outside the auto.arima)?
Any suggestion is welcome.

Best Answer

Yes, the problem is that auto.arima expects a ts time series - or an object that can be coerced to that. Internally, auto.arima coerces x <- as.ts(x). You can check what this looks like by:

## Time Series:
## Start = 16563 
## End = 16654 
## Frequency = 1 
##  [1] 1494.50      NA 1367.57      NA      NA      NA      NA 1357.57      NA
## [10] 1222.23      NA      NA      NA      NA 1124.02      NA 1011.64      NA
## [19]      NA      NA      NA 4575.64 3201.87 3050.04 2173.38 1967.88 1838.55
## [28] 1666.05      NA 1656.05 1524.96      NA      NA      NA      NA      NA
## [37]  835.96      NA      NA      NA      NA      NA      NA  775.36  592.36
## [46]  494.15      NA      NA      NA 4058.15 2624.36 2448.47      NA      NA
## [55] 1598.47 1398.47 1264.14 1165.88 1053.67      NA      NA      NA      NA
## [64]      NA      NA  941.36      NA      NA      NA      NA  821.36      NA
## [73]  471.36      NA      NA  373.15      NA      NA      NA  259.91      NA
## [82]      NA      NA      NA 3808.91 2262.26 1940.39 1011.39  800.81      NA
## [91]      NA  790.81

Thus, zoo detects the underlying numeric time index corresponding to the Dates (number of days since 1970-01-01), creates a regular grid, and fills it with NAs.

What to do in this situation depends on whether this is your complete data and what the pattern is that you expect to be relevant. For example, one could try to coarsen the data to a regular grid or try to fit a continuous time model etc.