Time Series Regression – Difference Between Serially Correlated Errors & Autocorrelation of Residuals

regressiontime series

If I have a time series regression model with serially correlated errors:

$Y_t=\beta_0+\beta_1X_t+\epsilon_t$

$\epsilon_t=\rho\epsilon_{t-1}+v_t$

why is the estimate of the correlation ($\rho$) in the second equation NOT the autocorrelation of the residuals from the first equation?

The autocorrelation of the residuals (from the first equation using OLS regression) is easy to get, BUT the rho is estimated with more difficulty [e.g. nonlinear least squares]. But I'm not following why.

Best Answer

It's the difference between sequential and joint estimation of parameters, fundamentally. If you estimate $\{\beta_0, \beta_1, \rho\}$ jointly you'll get a different set of estimates for all three parameters (almost always) than if you estimate $\{\beta_0, \beta_1 | \rho=0\}$ then estimate $\rho$ based on the residuals, which is what sequential estimation does. By not taking the estimate of $\rho$ into account when estimating $\beta$, you lose efficiency, which propagates into the residuals, which in turn affects the quality of the estimate of $\rho$ based on those residuals.

Note that if $\rho$ actually is 0, or very close to it, and your sample size is small(ish), the sequential approach might actually be better, because it gets rid of the effect of randomness in the estimate of $\rho$ on the estimates of $\beta$. But if you knew you were in that situation, you'd probably be better off still just setting $\hat{\rho} = 0$ and going with OLS to estimate $\beta$. The trouble is, you don't know, unless you have some external information, so sticking with maximum likelihood or some other joint estimation methodology is still your best bet.

Related Solutions

Time Series – Difference Between Autocorrelated Time-Series and Serially Autocorrelated Errors

Is seems to me that you are getting hung up on the difference between autoregression (temperature today is influenced by temperature yesterday, or my consumption of heroin today depends on my previous drug use) and autocorrelated errors (which have to do with the off-diagonal terms in variance-covariance terms for $\epsilon$ being non-zero. Sticking with your weather example, suppose you model temperature as a function of time, but it is also influenced by things like volcanic eruptions, which you left out of your model. The volcano sends up clouds of dust, which block out the sun, lowering the temperature. This random disturbance will persist over more than one period. This will make your time trend appear less steep than it should be. To be fair, it is probably the case that both autoregression and autocorrelated errors are an issue with temperature.

Autocorrelated errors can also arise in cross-sectional spatial data, where a random shock that affects economic activity in one region will spill over to other areas because they have economic ties. A shock that kills grapes in California will also lower sales of beef from Montana. You can also induce autocorrelated disturbances if you omit a relevant and autocorrelated independent variable from your time-series model.

Solved – Formula for Newey West Standard Error

Take your model of $$y_t=\beta_0+\beta_1x_t+u_t,$$ where $t=1,...,T$. We will assume there are no other regressors and that the serial correlation only lasts up to one period (so shocks do not persist for very long).

To get the Newey-West/HAC standard error of $\beta_1$ that is robust to heteroskedasticity and autocorrelation up to 1 lag, you should:

Estimate the model with OLS, which gives you usual $SE(\beta_1)$, the RMSE $\hat \sigma$, and the residuals $\hat u_1,...,\hat u_T$.
Get the ${\hat r_1,...,\hat r_T}$ residuals from the auxiliary OLS regression of $x_t$ on a constant and calculate $\hat a_t = \hat u_t \cdot \hat r_t$ for $t=1,...,T$. If you had more regressors, you would include them as additional covariates.
Assuming that the serial correlation lasts up to one period, calculate $$\hat v(1)=\sum_{t=1}^T \hat a_t^2+ \sum_{t=2}^T \hat a_t \cdot \hat a_{t-1}.$$ The first term in $\hat v$ is what gets you the het-consistent standard error. The second term is the autocorrelation part. Assuming the autocorrelation is positive, this is why your standard errors blow up: you have less information. If you autocorrelation lasted longer, you would have additional weighted terms for each each lag. You might want to use a finite sample correction multiplier of $\frac{T}{T-k}$ here, where we count the constant as one of the $k$ covariates.
Calculate N-W standard error of $\beta_1$ as $$\left[ \frac{SE(\beta_1)}{\hat \sigma} \right]^2 \cdot \sqrt{ \hat v(1)}.$$

Here's an example with Stata (with the full data shown by the list command in case you want to use other software). This has a slight wrinkle in that Stata uses a default finite sample correction of $\frac{T}{T-k}$ that is not default in other statistics packages and is usually not shown in textbook formulas, though it is a sensible thing to do:

. /* N-W Standard Errors With One Regressor and Serial Correlation That Dies Down After 1 Peri
> od */
. webuse idle2, clear

. tsset time
        time variable:  time, 1 to 30
                delta:  1 unit

. list time usr idle, clean noobs

    time   usr   idle  
       1     0    100  
       2     0    100  
       3     0     97  
       4     1     98  
       5     2     94  
       6     0     98  
       7     2     90  
       8     3     85  
       9     1     68  
      10     2     91  
      11     2     94  
      12     2     89  
      13     1     88  
      14     4     92  
      15     7     74  
      16     7     76  
      17     8     71  
      18     4     78  
      19     5     75  
      20    10     74  
      21    16     65  
      22    12     63  
      23     3     83  
      24     2     60  
      25     3     85  
      26     5     87  
      27     6     83  
      28     5     84  
      29     1     98  
      30     1     98  

. /* Step 1 */
. reg usr idle

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(1, 28)        =     28.07
       Model |   210.35435         1   210.35435   Prob > F        =    0.0000
    Residual |  209.812316        28  7.49329702   R-squared       =    0.5006
-------------+----------------------------------   Adj R-squared   =    0.4828
       Total |  420.166667        29  14.4885057   Root MSE        =    2.7374

------------------------------------------------------------------------------
         usr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        idle |  -.2281501   .0430607    -5.30   0.000    -.3163559   -.1399442
       _cons |   23.13483    3.67706     6.29   0.000     15.60271    30.66694
------------------------------------------------------------------------------

. scalar se_beta1 = _se[idle]

. scalar sigmahat    = e(rmse)

. predict double uhat, resid

. /* Step 2 */
. reg idle

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(0, 29)        =      0.00
       Model |           0         0           .   Prob > F        =         .
    Residual |      4041.2        29  139.351724   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =    0.0000
       Total |      4041.2        29  139.351724   Root MSE        =    11.805

------------------------------------------------------------------------------
        idle |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |       84.6    2.15524    39.25   0.000     80.19204    89.00796
------------------------------------------------------------------------------

. predict double rhat, resid

. gen double ahat = uhat*rhat

. /* Step 3 */
. gen double v = ahat^2

. replace v    = v + ahat*L1.ahat in 2/L
(29 real changes made)

. sum v

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           v |         30    3070.853    8464.421  -256.7766   33901.46

. scalar v1     = r(sum) 

. scalar v1_fsc = r(sum)*(30/28) // Stata uses a finite sample correction of T/(T-k)

. /* Step 4 */
. di "Usual N-W SE = " sqrt(scalar(v1))*[scalar(se_beta1)/scalar(sigmahat)]^2
Usual N-W SE = .07510689

. di "Stata's N-W SE  = " sqrt(scalar(v1_fsc))*[scalar(se_beta1)/scalar(sigmahat)]^2
Stata's N-W SE  = .07774301

. /* Comapare to newey command */
. newey usr idle, lag(1)

Regression with Newey-West standard errors      Number of obs     =         30
maximum lag: 1                                  F(  1,        28) =       8.61
                                                Prob > F          =     0.0066

------------------------------------------------------------------------------
             |             Newey-West
         usr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        idle |  -.2281501    .077743    -2.93   0.007    -.3873994   -.0689007
       _cons |   23.13483   7.119611     3.25   0.003     8.550965    37.71869
------------------------------------------------------------------------------

. di _se[idle]
.07774301

As you can see, with the finite sample correction, newey matches what we did by hand at 0.07774301. I am not sure this example contains a whole lot of intuition, but YMMV.

This is based on Wooldridge, Jeffrey M. "A computationally simple heteroskedasticity and serial correlation robust standard error for the linear regression model." Economics Letters 31.3 (1989): 239-243.

Stata Code:

/* N-W Standard Errors With One Regressor and Serial Correlation That Dies Down After 1 Period */
webuse idle2, clear
tsset time
list time usr idle, clean noobs
/* Step 1 */
reg usr idle
scalar se_beta1 = _se[idle]
scalar sigmahat = e(rmse)
predict double uhat, resid
/* Step 2 */
reg idle
predict double rhat, resid
gen double ahat = uhat*rhat
/* Step 3 */
gen double v = ahat^2
replace v    = v + ahat*L1.ahat in 2/L
sum v
scalar v1     = r(sum) 
scalar v1_fsc = r(sum)*(30/28) // Stata uses a finite sample correction of T/(T-k)
/* Step 4 */
di "Usual N-W SE = " sqrt(scalar(v1))*[scalar(se_beta1)/scalar(sigmahat)]^2
di "Stata's N-W SE  = " sqrt(scalar(v1_fsc))*[scalar(se_beta1)/scalar(sigmahat)]^2
/* Compare to newey command */
newey usr idle, lag(1)
di _se[idle]
/* Compare To Smaller Robust Sandwich SE */ 
sum v
scalar v0     = r(sum)*(30/28)
di "Stata's Sandwich SE  = " sqrt(scalar(v0))*[scalar(se_beta1)/scalar(sigmahat)]^2
reg usr idle, robust
di _se[idle]

Best Answer

Related Solutions

Time Series – Difference Between Autocorrelated Time-Series and Serially Autocorrelated Errors

Solved – Formula for Newey West Standard Error

Related Question