I have used auto.arima to fit a time series model (a linear regression with ARIMA errors, as described on Rob Hyndman's site ) When finished – the output reports that the best model has a (5,1,0) with drift structure – and reports back values of information criteria as
AIC: 2989.2
AICC: 2989.3
BIC: 3261.2
When I use Arima to fit a model with a (1,1,1) with drift structure – the output reports back noticeably lower IC's of
AIC: 2510.3
AICC: 2510.4
BIC: 2759
I can force auto.arima to consider the (1,1,1) with drift model (using the start.p and start.q parameters), and when I do that, and set "trace=TRUE" – I do see that the (1,1,1) with drift model is considered, but rejected, by auto.arima. It still reports back the (5,1,0) with drift model as the best result.
Are there circumstances when auto.arima uses other criteria to choose between models?
Edited to add (in response to request)
Data for this example can be found at this Google spreadsheet
and R code to reproduce the example is
repro = read.csv("mindata.csv")
reprots = ts(repro, start=1, frequency=24)
fitauto = auto.arima(reprots[,"lnwocone"],
xreg=cbind(fourier(reprots[,"lnwocone"], K=11),
reprots[,c("temp","sqt","humidity","windspeed","mist","rain")]),
start.p=1, start.q=1, trace=TRUE, seasonal=FALSE)
fitdirect <- Arima(reprots[,"lnwocone"], order=c(1,1,1), seasonal=c(0,0,0),
xreg=cbind(fourier(reprots[,"lnwocone"], K=11),
reprots[,c("temp","sqt","humidity","windspeed","mist","rain")]), include.drift=TRUE)
summary(fitauto)
summary(fitdirect)
Apologies if the Google docs data – inline code is not the best way to provide the example. I think I have seen in the past guidelines on the best way to do this – but could not locate those guidelines in searching this morning.
Best Answer
auto.arima
uses some approximations in order to speed up the processing. The final model is fitted using full MLE, but along the way the models are estimated using CSS unless you use the argumentapproximation=FALSE
. This is explained in the help file:The default setting is
approximation=(length(x)>100 | frequency(x)>12)
, again this is specified in the help file. As you have 17544 observations, the default setting givesapproximation=TRUE
.Using the approximations, the best model found was a regression with ARIMA(5,1,0) errors with AICc of 2989.33. If you turn the approximations off, the best model has ARIMA(2,1,1) errors with an AICc of 2361.40.