auto.arima
uses some approximations in order to speed up the processing. The final model is fitted using full MLE, but along the way the models are estimated using CSS unless you use the argument approximation=FALSE
. This is explained in the help file:
approximation If TRUE
, estimation is via conditional sums of squares
and the information criteria used for model selection are approximated.
The final model is still computed using maximum likelihood estimation.
Approximation should be used for long time series or a high seasonal
period to avoid excessive computation times.
The default setting is approximation=(length(x)>100 | frequency(x)>12)
, again this is specified in the help file. As you have 17544 observations, the default setting gives approximation=TRUE
.
Using the approximations, the best model found was a regression with ARIMA(5,1,0) errors with AICc of 2989.33. If you turn the approximations off, the best model has ARIMA(2,1,1) errors with an AICc of 2361.40.
> fitauto = auto.arima(reprots[,"lnwocone"], approximation=FALSE,
xreg=cbind(fourier(reprots[,"lnwocone"], K=11),
reprots[,c("temp","sqt","humidity","windspeed","mist","rain")]),
start.p=1, start.q=1, trace=TRUE, seasonal=FALSE)
> fitauto
Series: reprots[, "lnwocone"]
ARIMA(2,1,1) with drift
...
sigma^2 estimated as 0.08012: log likelihood=-1147.63
AIC=2361.27 AICc=2361.4 BIC=2617.76
A note on terminology: commonly we fit a model to the data rather than fit the data to a model.
I can do step 1, but don't know how to relate that to step 2. Am I using the remainder from stl analysis for ARIMA modeling? If not, what's the point of step 1?
From STL you obtain three components: trend
, seasonal
and remainder
. You could remove the seasonal
component and use the sum of trend
and remainder
for further modelling with ARIMA.
But I can't get past the diagnostics. My Ljung-Box values are ALWAYS significant for ALL lags. Okay, so that means my residuals are correlated (I think). And since I want to use the residuals for cross-correlation, I assume that's bad.
Yes, having significant autocorrelations for ALL lags is clearly a problem. I would generally agree with the comment by @Glen_b, but in a case where all lags are significant the problem seem hard to deny. Curiously, the ACF plot does not immediately suggest that the autocorrelations are a really big problem (only a few lags stick outside the confidence interval by much) and the latter only becomes evident from the Ljung-Box test. I would not stop there and I would not accept a model with such a terrible Ljung-Box picture. Instead, I would look for other models.
One caveat: if you use STL and remove the seasonal
component before estimating ARIMA models on trend
+seasonal
, you should not allow for a seasonal component in the ARIMA model (making it a SARIMA model); use option seasonal=FALSE
in function auto.arima
. Perhaps making this change will help you find better models.
Note also that after taking the 24-hour difference, the ACF and PACF still have significant 24-hour lags. This may indicate that taking the 24-hour difference was not such a good idea. Normally you would expect the lag at which you have differenced the data to not have significant ACF or PACF value.
Does this mean my time series doesn't fit an ARIMA model?
The model you showed us indeed does not seem to fit the data well as evidenced by the poor Ljung-Box statistics. If I were you, I would try some other model instead.
Best Answer
This is probably explained in the documentation. Looking the source code I found that
Inf
is reported when the likelihood of the model turns out to be infinity or when the lowest root in the polynomials of the model are lower than 1.01.When the AR polynomial is close to be non-stationary or when the MA polynomial is close to be non-invertible, then the model is rejected by setting an infinite value for the AIC related to that model.
Inf *
is reported when the ARIMA model couldn't be fitted and an error was returned bystats::arima
.For example, the following reports a value of the AIC equal to
Inf
for the model ARIMA(2,1,2):Fitting this particular model, we can see that the MA polynomial is close to be non-invertible, that's why
auto.arima
sets a large value to theAIC
in order to make sure that this model is not chosen:However, we can see that the MA polynomial is close to be non-invertible, that's why
auto.arima
sets a large value to theAIC
in order to make sure that this model is not chosen: