I'm trying to fit an ARIMA model to housing data set.
Playing around with the p's and q I was able to get an ARIMA Model (2,1,2,)(2,0,0) with an AIC value of AIC=4946.76
I used auto.arima to see if I picked the best model. auto.arima picked the (2,1,3)(2,0,0) model that had an AIC value of AIC=4948.21 .
Then I looked at the values for both models to see with the difference was between the two.
The ARIMA (2,1,2)(2,0,0) model had an error
Warning message:
In sqrt(diag(x$var.coef)) : NaNs produced
My question is why did auto.arima pick the (2,1,3)(2,0,0) model instead of (2,1,2)(2,0,0)?
Best Answer
auto.arima will do some things like use approximations, in order to speed things up. You can try using auto.arima(data, approx=FALSE, stepwise=FALSE) to turn off some of the approximating to deal with the error, which is likely caused by coefficients being close to the edge of the stationarity region. As a warning, this may take longer than normal. You could try just approx=FALSE first.
You can use auto.arima(...)\$aic to get the actual value of the AIC, maybe it is very slightly smaller for $q=3$. As the values are almost exactly the same, it probably doesn't matter too much the value of $q$. If you think $q=2$ from "playing around", then $q=2$ is fine. Time series is not an exact science and there is a small amount of subjectivity involved. As long as you justify why you chose $q=2$ and you do the correct model diagnostics (for example, looking at the residuals), then there is no need to worry.