Solved – Getting Residuals to be White Noise

arimaautocorrelationtime serieswhite noise

I'm on a time series project for an undergraduate course. For the project I'm trying to come up with an ARIMA model for the housing starts data set.
http://www.quandl.com/FRED/HOUST-Housing-Starts-Total-New-Privately-Owned-Housing-Units-Started

I have been trying to adjust the p and q to get residuals that are uncorrelated white noise.
I tried auto.arima to get parameter values.

> fit.start=auto.arima(data)
> fit.start
Series: data 
ARIMA(2,1,3)(2,0,0)[12]                    

Coefficients:
         ar1      ar2      ma1     ma2      ma3     sar1     sar2
      1.2874  -0.3256  -1.6559  0.8715  -0.1725  -0.1352  -0.2074
s.e.  0.5709   0.5348   0.5665  0.7266   0.2052   0.0520   0.0517


> fit213200<-Arima(data, order=c(2,1,3), seasonal=c(2,0,0))
> tsdisplay(residuals(fit213200))

enter image description here

So is there anyway to get the residuals to be just white noise?

Best Answer

To fit the best ARIMA model, you need to follow the standard method, the Box-Jenkins method. Notably, you should not choose $p$ and $q$ simply via looking at residual plots. The auto.arima function you have used is designed to do a lot of the work for you, but there is certainly no guarantee that it will give you the best model, it merely chooses the model with the lowest AIC, AICc or BIC.

So what is the Box-Jenkins method? Without going into too much depth, you need to:

First check for stationarity. Plot the time series, as in plot.ts(data). Is the average value of the time series constant? If it increases and decreases linearly, so there is a linear trend in your series, you will need first-order differencing; if the change is quadratic then second-order, and so on. It is important to not over-difference, however. In an ARIMA model you have ARIMA($p,d,q$) and first-order differencing means you need $d=1$. As well as a constant mean, you require an approximately constant variance. You can consider some transformations if it's clearly not constant but I don't think this will be a problem here.

Next check for seasonality. The easiest way to do this is look at the ACF plot of the data. If there is decay and then a spike at regular intervals, then there is a seasonal trend. In this problem, there's a good chance there is a yearly trend. So you would get a spike at lag 12, 24, and so on, as there are 12 observations in a year for data collected monthly. Note that this agrees with what auto.arima found.

Let's say that following the above steps you agree with what auto.arima gave you, as in there is a linear trend ($d=1$) and there is a yearly trend. You now want to find $p$ and $q$ but you need to look at the data after the differencing. You can do this with the diff function in R. diff(data) gives you your data after first-order differencing and diff(data, 12) gives your data after dealing with the yearly trend. So diff(diff(data), 12) should give you something stationary, plot it and see!

Now you can choose the values of $p$ and $q$. Examine the ACF and PACF and you should be able to choose appropriate values for $p$ and $q$. Look at the Box-Jenkins method on Wikipedia or similar for details. If you think $p=2$ and $q=0$, fit it and maybe try some similar ones.

Now, finally, come the residuals and the graphs you provided. If you've done everything right so far, you should get white noise. If not then some assumption made so far is wrong and generally it isn't easy to say which. If the residuals look like some process, like an AR(1) and not white noise, this can tell you what model you should try instead, but again I won't go into details as hopefully they will look like white noise.

Now do you actually have white noise and how do you tell? If you have white noise, then 95% of your sample autocorrelations should be between the two blue lines and so one line slightly over the line doesn't automatically imply you don't have white noise (remember white noise is random, specifically Gaussian). An ARIMA model is never going to fit the data perfectly, so you can't expect to have perfect residuals that are exactly white noise. Although it is certainly somewhat subjective, I'd say that the residuals you have are pretty close to white noise.

Related Question