Solved – Should auto.arima in R ever report a model with higher AIC, AICC and BIC than other models considered

aicarimar

I have used auto.arima to fit a time series model (a linear regression with ARIMA errors, as described on Rob Hyndman's site ) When finished – the output reports that the best model has a (5,1,0) with drift structure – and reports back values of information criteria as

AIC: 2989.2
AICC: 2989.3
BIC: 3261.2

When I use Arima to fit a model with a (1,1,1) with drift structure – the output reports back noticeably lower IC's of

AIC: 2510.3
AICC: 2510.4
BIC: 2759

I can force auto.arima to consider the (1,1,1) with drift model (using the start.p and start.q parameters), and when I do that, and set "trace=TRUE" – I do see that the (1,1,1) with drift model is considered, but rejected, by auto.arima. It still reports back the (5,1,0) with drift model as the best result.

Are there circumstances when auto.arima uses other criteria to choose between models?

Edited to add (in response to request)

Data for this example can be found at this Google spreadsheet

and R code to reproduce the example is

repro = read.csv("mindata.csv")
reprots = ts(repro, start=1, frequency=24)
fitauto = auto.arima(reprots[,"lnwocone"],
xreg=cbind(fourier(reprots[,"lnwocone"], K=11),
reprots[,c("temp","sqt","humidity","windspeed","mist","rain")]),
start.p=1, start.q=1, trace=TRUE, seasonal=FALSE)
fitdirect <- Arima(reprots[,"lnwocone"], order=c(1,1,1), seasonal=c(0,0,0),
xreg=cbind(fourier(reprots[,"lnwocone"], K=11),
reprots[,c("temp","sqt","humidity","windspeed","mist","rain")]), include.drift=TRUE)
summary(fitauto)
summary(fitdirect)

Apologies if the Google docs data – inline code is not the best way to provide the example. I think I have seen in the past guidelines on the best way to do this – but could not locate those guidelines in searching this morning.

Best Answer

auto.arima uses some approximations in order to speed up the processing. The final model is fitted using full MLE, but along the way the models are estimated using CSS unless you use the argument approximation=FALSE. This is explained in the help file:

approximation If TRUE, estimation is via conditional sums of squares and the information criteria used for model selection are approximated. The final model is still computed using maximum likelihood estimation. Approximation should be used for long time series or a high seasonal period to avoid excessive computation times.

The default setting is approximation=(length(x)>100 | frequency(x)>12), again this is specified in the help file. As you have 17544 observations, the default setting gives approximation=TRUE.

Using the approximations, the best model found was a regression with ARIMA(5,1,0) errors with AICc of 2989.33. If you turn the approximations off, the best model has ARIMA(2,1,1) errors with an AICc of 2361.40.

> fitauto = auto.arima(reprots[,"lnwocone"], approximation=FALSE,
                xreg=cbind(fourier(reprots[,"lnwocone"], K=11),
                reprots[,c("temp","sqt","humidity","windspeed","mist","rain")]),
                start.p=1, start.q=1, trace=TRUE, seasonal=FALSE)
> fitauto
Series: reprots[, "lnwocone"] 
ARIMA(2,1,1) with drift         
...
sigma^2 estimated as 0.08012:  log likelihood=-1147.63
AIC=2361.27   AICc=2361.4   BIC=2617.76

Best Answer

Related Solutions

Solved – time series forecasting using auto.arima and exponential smoothing

Solved – auto.arima not giving the best model according to information criteria

Related Question