Solved – Autocorrelated residuals from `auto.arima`

arimaautocorrelation

I'm having issues with the residuals of my ARIMA models in R for two time series. When I run the Ljung-Box test on the residuals, I get that I should reject the null (i.e. my residuals still have some correlation). I don't know what I should do next. My end goal is to show that the steel time series can be used to predict car production.

The steel and cars time series data was extracted from these sources: steel and cars.

The following is my code:

steel <- read.csv("~/stat248/monthly-production-of-raw-steel-.csv")
cars <- read.csv("~/stat248/australia-monthly-production-of-.csv")
colnames(cars)[2]='cars'
colnames(steel)[2]='steel'

cars=ts(cars$cars,start=c(1956,1),end=c(1993,11),frequency = 12)
steel=ts(steel$steel,start=c(1956,1),end=c(1993,11),frequency = 12)
plot(cbind(cars,steel),main="Production of Cars and Steel in Australia")

cars = na.interpolation(cars)
logcars = log(cars)
logsteel = log(steel)
logcars_stl = stl(logcars,s.window = "periodic")
logsteel_stl = stl(logsteel,s.window = "periodic")

logsteel_arima = auto.arima(logsteel_stl$time.series[,"remainder"],approximation = FALSE,trace=FALSE)
logcars_arima = auto.arima(logcars_stl$time.series[,"remainder"],approximation = FALSE,trace=FALSE)

> Box.test(logcars_arima$residuals,lag=20,type="Ljung-Box")

    Box-Ljung test

data:  logcars_arima$residuals
X-squared = 61.454, df = 20, p-value = 4.231e-06

> Box.test(logsteel_arima$residuals,lag=20,type="Ljung-Box")

    Box-Ljung test

data:  logsteel_arima$residuals
X-squared = 56.109, df = 20, p-value = 2.799e-05

Here I get tiny $p$-values even after using auto.arima. The standard ARIMA method of comparing AICs didn't fare any better. Any advice?

Best Answer

Ljung-Box test is inappropriate for testing residuals from an ARIMA model, and Breusch-Godfrey test should be used instead; see Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey.

(Even if you tried using the Ljung-Box test, the standard practice is to adjust the degrees of freedom of the null distribution for the fact that you are supplying residuals rather than raw data to the test. This can be done using argument fitdf in function Box.test. fitdf should equal the $p+q$ where $p$ is the autoregressive order and $q$ is the moving average order.)

Also, no autocorrelation is not necessarily an indication that you have a model that will generalize well out of sample. You are likely to overfit when you require the residuals to have no autocorrelation. Meanwhile, AIC-based model selection as used in auto.arima strikes a good balance between underfitting and overfitting.