Solved – Interpretation of standard error of ARIMA parameters

arimastandard error

I'm having trouble understanding what the parameter estimates and their standard errors tell us when evaluating a model. For example, here are the estimates for an Arima(2,1,0) model that I cooked up in R:

Coefficients:
         ar1      ar2
      1.0251  -0.1781
s.e.  0.2204   0.2193

I've heard things like "the ar1 is significant because its estimate is more than twice the s.e.", but I don't understand why that criteria makes it "good". By this logic, the inclusion of a second parameter does not look good.

So how exactly are the parameters distributed, and how do we interpret their estimates and standard errors?

Best Answer

The standard errors of estimated AR parameters have the same interpretation as the standard error of any other estimate: they are (an estimate of) the standard deviation of its sampling distribution.

The idea is that there is some unknown but fixed underlying data generating process (DGP), governed by an unknown but fixed ARIMA process. The specific time series you observe is a single realization of this process. If you now went and sampled many time series arising from this DGP, then they would all look somewhat different, because of different innovations. However, you could fit an ARIMA model to all of them. Then you would of course get different AR parameter estimates for each time series.

The standard error of the AR estimates is an estimate of the standard deviation of these AR estimates.

A simulation might be helpful. Below, I'll use an AR(2) model with parameters $(1.0,-0.2)$. I'll generate a time series of length 100 using this model, then fit an AR(2) model, store the AR parameter estimates - and repeat this 10,000 times. Finally, I plot histograms of the parameter estimates, plus the actual values as red vertical lines - and then compare the standard deviations of the AR parameter estimates against the (average of the) estimated standard errors. And the two match up.

nn <- 100
n.sims <- 10000
true.model <- list(ar=c(1.0,-0.2))

params <- ses <- matrix(NA,nrow=n.sims,ncol=length(true.model$ar))
for ( ii in 1:n.sims ) {
	set.seed(ii)
	series <- arima.sim(model=true.model,n=nn)
	model <- arima(series,order=c(2,0,0),include.mean=FALSE)
	params[ii,] <- coefficients(model)
	ses[ii,] <- sqrt(diag(model$var.coef))
}

opar <- par(mfrow=c(1,2))
    for ( jj in seq_along(true.model$ar) ) {
		hist(params[,jj],col="grey",xlab="",main=paste0("AR(",jj,") parameter"))
		abline(v=true.model$ar[jj],lwd=2,col="red")
    }
par(opar)

apply(params,2,sd)
# [1] 0.09844388 0.09795008
apply(ses,2,mean)
# [1] 0.09754488 0.09833490

Note that I simulate with a zero mean and explicitly tell arima() to not use a mean. And that the entire exercise crucially depends on the assumption that we know the ARIMA orders with certainty! If we first need to select the correct order, then everything will be biased, and the standard errors lose their interpretation. (Yes, this kind of makes all this a somewhat theoretical and academic exercise.)

If you want to dive more deeply into the maths, any mathematical time series textbook should do well. (Anything with "business" in the title will likely gloss over these details.) I recently skimmed Time Series: Theory and Methods by Brockwell and Davies (2006), which looked pretty good, but I can't recall offhand whether this topic was treated at any depth there.

Related Solutions

Solved – Understanding standard errors in logistic regression

I think the first thing you need to ensure is that you're not comparing apples to orangutans. Then we will discuss standard errors, statistical significance, and model selection.

Here's how you might compare OLS/LPM and logit coefficients for dummy-dummy interactions. We will model union membership as a function of race and education (both categorical) for US women from the NLS88 survey.

First, we will use OLS with factor variable notation for the interactions:

. sysuse nlsw88, clear
(NLSW, 1988 extract)

. reg union i.race##i.collgrad

      Source |       SS       df       MS              Number of obs =    1878
-------------+------------------------------           F(  5,  1872) =    7.02
       Model |  6.40214176     5  1.28042835           Prob > F      =  0.0000
    Residual |  341.434386  1872  .182390164           R-squared     =  0.0184
-------------+------------------------------           Adj R-squared =  0.0158
       Total |  347.836528  1877  .185315146           Root MSE      =  .42707

-------------------------------------------------------------------------------------
              union |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
               race |
             black  |   .0799445   .0250534     3.19   0.001     .0308089    .1290801
             other  |   .1157454   .1076307     1.08   0.282    -.0953433    .3268342
                    |
           collgrad |
      college grad  |   .0975234   .0261143     3.73   0.000     .0463072    .1487395
                    |
      race#collgrad |
black#college grad  |   .0415079   .0563381     0.74   0.461    -.0689841        .152
other#college grad  |  -.0350234   .1867622    -0.19   0.851    -.4013073    .3312606
                    |
              _cons |   .1967546   .0136007    14.47   0.000     .1700804    .2234288
-------------------------------------------------------------------------------------

For instance, black women who also graduated from college are 4.15 percentage points more likely to be in a union.

Now we fit a logit model:

. logit union i.race##i.collgrad, nolog

Logistic regression                               Number of obs   =       1878
                                                  LR chi2(5)      =      33.33
                                                  Prob > chi2     =     0.0000
Log likelihood = -1029.9582                       Pseudo R2       =     0.0159

-------------------------------------------------------------------------------------
              union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
               race |
             black  |   .4458082   .1361797     3.27   0.001      .178901    .7127154
             other  |   .6182459   .5452764     1.13   0.257    -.4504762    1.686968
                    |
           collgrad |
      college grad  |   .5320064   .1397767     3.81   0.000     .2580491    .8059637
                    |
      race#collgrad |
black#college grad  |   .0885629   .2791468     0.32   0.751    -.4585548    .6356807
other#college grad  |  -.2543746    .918575    -0.28   0.782    -2.054748    1.545999
                    |
              _cons |  -1.406703   .0801078   -17.56   0.000    -1.563712   -1.249695
-------------------------------------------------------------------------------------

The logit index function coefficients are not particularly meaningful since they are not effects on the probability of union membership. The sign and the significance might tell you something, but the magnitude of the effect is not clear. Also note that the standard errors are large, like in your own data. For instance, the SE of the college graduate of other race coefficient is almost 1.

To get something comparable to OLS, we will use margins with the contrast operator:

. margins r.race##r.collgrad

Contrasts of predictive margins

Model VCE    : OIM

Expression   : Pr(union), predict()

----------------------------------------------------------------------------------------
                                                     |         df        chi2     P>chi2
-----------------------------------------------------+----------------------------------
                                                race |
                                   (black vs white)  |          1       14.34     0.0002
                                   (other vs white)  |          1        1.20     0.2725
                                              Joint  |          2       15.14     0.0005
                                                     |
                                            collgrad |          1       19.09     0.0000
                                                     |
                                       race#collgrad |
(black vs white) (college grad vs not college grad)  |          1        0.44     0.5085
(other vs white) (college grad vs not college grad)  |          1        0.03     0.8666
                                              Joint  |          2        0.48     0.7869
----------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------
                                                     |            Delta-method
                                                     |   Contrast   Std. Err.     [95% Conf. Interval]
-----------------------------------------------------+------------------------------------------------
                                                race |
                                   (black vs white)  |   .0901999   .0238201      .0435134    .1368864
                                   (other vs white)  |   .1070922   .0976013     -.0842029    .2983873
                                                     |
                                            collgrad |
                 (college grad vs not college grad)  |    .108149   .0247526      .0596347    .1566633
                                                     |
                                       race#collgrad |
(black vs white) (college grad vs not college grad)  |    .041508   .0627785     -.0815355    .1645515
(other vs white) (college grad vs not college grad)  |  -.0350233   .2084485     -.4435749    .3735282
------------------------------------------------------------------------------------------------------

These are pretty close to the OLS effects. For instance, black women who graduated from college are also 4.15 percentage points more likely to be in a union according to the logit model. The SEs are somewhat smaller.

Sometimes you can't run the margins command because you don't have the data. All you have are the logit coefficients from someone's paper. While I said they were not particularly meaningful in their raw form, you can transform the logit index function coefficients into a multiplicative effect by exponentiating them, which is easy enough with a calculator. For example, the index function coefficient for black college graduates was .0885629. If I exponentiate it, I get $\exp(.0885629)=1.092603$. This tells me that black college graduates are 1.09 times more likely to be union members compared to a baseline of $\exp(-1.406703)=0.24494955$ (the baseline is the exponentiated constant from the logit). So this means that the union rate for black college graduates will be $0.24\cdot 1.09$ or about $26$%. OLS and logit with margins, will give the additive effect, so there we get about $19.67+4.15=23.87$. That's pretty darn close. It won't always work out so nicely.

Stata will give you exponentiated coefficients when you specify odds ratios option or:

. logit union i.race##i.collgrad, or nolog

or just use logistic:

. logistic union i.race##i.collgrad, nolog

I learned about these tricks from Maarten L. Buis. There are lots of examples with interactions of various sorts and nonlinear models at that link.

In my toy example, I did not cluster my errors, but that doesn't change the main thrust of these results. Some people don't like clustered standard errors in logit/probits because if the model's errors are heteroscedastic the parameter estimates are inconsistent.

After that long detour, we finally get to statistical significance. In all the models above (OLS, logit index function, logit margins, and OR logit), all the interactions are statistically insignificant (though the main effects generally are not). The standard errors are large compared to the estimates, so the data is consistent with the effects on all scales being zero (the confidence intervals include zero in the additive case and 1 in the multiplicative). If we surveyed enough women, it is possible that we would be able to detect some statistically significant interactions. The statistical significance depends in part on the sample size. If you don't have too many Bhutanese students in your data, it will be hard to detect even the main effect, much less the foreign friends interaction. On the other hand, if the effect is huge, you might be able to detect it with only a few students. Perhaps you can try grouping students by continent instead of country, though too much data-driven variable transformation is to be avoided.

Generally, OLS and non-linear models will give you similar results. If they don't, as may be the case with your data, I think you should report both and let you audience pick. Some people believe OLS/LPM is more robust to departures from assumptions (like heteroscedasticity), others disagree vehemently. You can and should justify a preferred model in various ways, but that's a whole question in itself. Personally, I would report both clustered OLS and non-clustered logit marginal effects (unless there's little difference between the clustered and non-clustered versions). You can also use an LM test to rule out heteroscedasticity.

Finally, with dummy-dummy interactions, I believe the sign and the significance of the index function interaction corresponds to the sign and the significance of the marginal effects. For continuous-continuous interactions (and perhaps continuous-dummy as well), that is generally not the case in non-linear models like the logit.

Solved – Estimating standard error of parameters of linear model fitted using gradient descent

I found that bootstrap gives estimates that are pretty close to those from OLS, but works with literally any training algorithm.

Bootstrap is a kind of Monte Carlo method and roughly boils down to repeated sampling with replacement from original dataset and collecting values of a target statistic. Having a set of statistic values, it becomes trivial to calculate their mean and standard error. G. James et al. provide experimental evidence of closeness of OLS and bootstrap results. Without further explanation, I'm giving a link to their excellent work (see pages 187-190 for bootstrap explanation and 195-197 for experiments):

G. James et al. An introduction to Statistical Learning

Best Answer

Related Solutions

Solved – Understanding standard errors in logistic regression

Solved – Estimating standard error of parameters of linear model fitted using gradient descent

Related Question