Solved – Interpretation of standard error of ARIMA parameters

arimastandard error

I'm having trouble understanding what the parameter estimates and their standard errors tell us when evaluating a model. For example, here are the estimates for an Arima(2,1,0) model that I cooked up in R:

Coefficients:
         ar1      ar2
      1.0251  -0.1781
s.e.  0.2204   0.2193

I've heard things like "the ar1 is significant because its estimate is more than twice the s.e.", but I don't understand why that criteria makes it "good". By this logic, the inclusion of a second parameter does not look good.

So how exactly are the parameters distributed, and how do we interpret their estimates and standard errors?

Best Answer

The standard errors of estimated AR parameters have the same interpretation as the standard error of any other estimate: they are (an estimate of) the standard deviation of its sampling distribution.

The idea is that there is some unknown but fixed underlying data generating process (DGP), governed by an unknown but fixed ARIMA process. The specific time series you observe is a single realization of this process. If you now went and sampled many time series arising from this DGP, then they would all look somewhat different, because of different innovations. However, you could fit an ARIMA model to all of them. Then you would of course get different AR parameter estimates for each time series.

The standard error of the AR estimates is an estimate of the standard deviation of these AR estimates.

A simulation might be helpful. Below, I'll use an AR(2) model with parameters $(1.0,-0.2)$. I'll generate a time series of length 100 using this model, then fit an AR(2) model, store the AR parameter estimates - and repeat this 10,000 times. Finally, I plot histograms of the parameter estimates, plus the actual values as red vertical lines - and then compare the standard deviations of the AR parameter estimates against the (average of the) estimated standard errors. And the two match up.

nn <- 100
n.sims <- 10000
true.model <- list(ar=c(1.0,-0.2))

params <- ses <- matrix(NA,nrow=n.sims,ncol=length(true.model$ar))
for ( ii in 1:n.sims ) {
	set.seed(ii)
	series <- arima.sim(model=true.model,n=nn)
	model <- arima(series,order=c(2,0,0),include.mean=FALSE)
	params[ii,] <- coefficients(model)
	ses[ii,] <- sqrt(diag(model$var.coef))
}

opar <- par(mfrow=c(1,2))
    for ( jj in seq_along(true.model$ar) ) {
		hist(params[,jj],col="grey",xlab="",main=paste0("AR(",jj,") parameter"))
		abline(v=true.model$ar[jj],lwd=2,col="red")
    }
par(opar)

apply(params,2,sd)
# [1] 0.09844388 0.09795008
apply(ses,2,mean)
# [1] 0.09754488 0.09833490

histograms

Note that I simulate with a zero mean and explicitly tell arima() to not use a mean. And that the entire exercise crucially depends on the assumption that we know the ARIMA orders with certainty! If we first need to select the correct order, then everything will be biased, and the standard errors lose their interpretation. (Yes, this kind of makes all this a somewhat theoretical and academic exercise.)

If you want to dive more deeply into the maths, any mathematical time series textbook should do well. (Anything with "business" in the title will likely gloss over these details.) I recently skimmed Time Series: Theory and Methods by Brockwell and Davies (2006), which looked pretty good, but I can't recall offhand whether this topic was treated at any depth there.