Newey West – Finding the Optimal Lag for Newey West Regression

neweywestregressiontime series

I am currently working on my PhD thesis and was wondering how I can identify the optimal number of lags for the Newey West covariance matrix. So far, my code is

library(sandwich) 
library(lmtest)
library(dynlm)


Regression <- dynlm(y ~ x)
coeftest(Regression, vcov = NeweyWest(Regression, lag = NULL))

Since I have to compare coefficients of multiple regressions (for the same time series), I should use the same lag as well. However is there a rule of thumb or a simple method to get the lag?

Thank you and best regards

Best Answer

I don't think that you necessarily need to fix the lag length across regressions. Due to the different model specifications there may be slightly different amounts of autocorrelation in the residuals that you have to adjust for.

However, you can easily display the lag length employed by NeweyWest() and also extract it using bwNeweyWest(). For simple replication, let's consider the following (non-sensical because non-time-series) linear model:

m <- lm(dist ~ speed, data = cars)
NeweyWest(m, lag = NULL, verbose = TRUE)
## Lag truncation parameter chosen: 3 
##             (Intercept)      speed
## (Intercept)   49.250542 -3.7390757
## speed         -3.739076  0.3035406

The bandwidth estimation is done by bwNeweyWest():

bwNeweyWest(m)
## [1] 3.032204

The lag length is just floor() of that.

Related Solutions

Solved – Newey-West t-statistics

Seeing as how I had a similar question earlier and came across this long-unanswered question through a simple web search, I'll take a stab and post what I think is one possible solution to your situation that others may also be encountering.

According to SAS Support, you can take the time-series you have and fit an intercept-only regression model to the series. The estimated intercept for this regression model will be the sample mean of the series. You can then pass this intercept-only regression model through the SAS commands used to retrieve Newey-West standard errors of a regression model.

Here is the link to the SAS Support page: http://support.sas.com/kb/40/098.html

Look for "Example 2. Newey-West standard error correction for the sample mean of a series"

In your case, simply try the same approach with Matlab.If someone has a better approach, please enlighten us.

Solved – Formula for Newey West Standard Error

Take your model of $$y_t=\beta_0+\beta_1x_t+u_t,$$ where $t=1,...,T$. We will assume there are no other regressors and that the serial correlation only lasts up to one period (so shocks do not persist for very long).

To get the Newey-West/HAC standard error of $\beta_1$ that is robust to heteroskedasticity and autocorrelation up to 1 lag, you should:

Estimate the model with OLS, which gives you usual $SE(\beta_1)$, the RMSE $\hat \sigma$, and the residuals $\hat u_1,...,\hat u_T$.
Get the ${\hat r_1,...,\hat r_T}$ residuals from the auxiliary OLS regression of $x_t$ on a constant and calculate $\hat a_t = \hat u_t \cdot \hat r_t$ for $t=1,...,T$. If you had more regressors, you would include them as additional covariates.
Assuming that the serial correlation lasts up to one period, calculate $$\hat v(1)=\sum_{t=1}^T \hat a_t^2+ \sum_{t=2}^T \hat a_t \cdot \hat a_{t-1}.$$ The first term in $\hat v$ is what gets you the het-consistent standard error. The second term is the autocorrelation part. Assuming the autocorrelation is positive, this is why your standard errors blow up: you have less information. If you autocorrelation lasted longer, you would have additional weighted terms for each each lag. You might want to use a finite sample correction multiplier of $\frac{T}{T-k}$ here, where we count the constant as one of the $k$ covariates.
Calculate N-W standard error of $\beta_1$ as $$\left[ \frac{SE(\beta_1)}{\hat \sigma} \right]^2 \cdot \sqrt{ \hat v(1)}.$$

Here's an example with Stata (with the full data shown by the list command in case you want to use other software). This has a slight wrinkle in that Stata uses a default finite sample correction of $\frac{T}{T-k}$ that is not default in other statistics packages and is usually not shown in textbook formulas, though it is a sensible thing to do:

. /* N-W Standard Errors With One Regressor and Serial Correlation That Dies Down After 1 Peri
> od */
. webuse idle2, clear

. tsset time
        time variable:  time, 1 to 30
                delta:  1 unit

. list time usr idle, clean noobs

    time   usr   idle  
       1     0    100  
       2     0    100  
       3     0     97  
       4     1     98  
       5     2     94  
       6     0     98  
       7     2     90  
       8     3     85  
       9     1     68  
      10     2     91  
      11     2     94  
      12     2     89  
      13     1     88  
      14     4     92  
      15     7     74  
      16     7     76  
      17     8     71  
      18     4     78  
      19     5     75  
      20    10     74  
      21    16     65  
      22    12     63  
      23     3     83  
      24     2     60  
      25     3     85  
      26     5     87  
      27     6     83  
      28     5     84  
      29     1     98  
      30     1     98  

. /* Step 1 */
. reg usr idle

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(1, 28)        =     28.07
       Model |   210.35435         1   210.35435   Prob > F        =    0.0000
    Residual |  209.812316        28  7.49329702   R-squared       =    0.5006
-------------+----------------------------------   Adj R-squared   =    0.4828
       Total |  420.166667        29  14.4885057   Root MSE        =    2.7374

------------------------------------------------------------------------------
         usr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        idle |  -.2281501   .0430607    -5.30   0.000    -.3163559   -.1399442
       _cons |   23.13483    3.67706     6.29   0.000     15.60271    30.66694
------------------------------------------------------------------------------

. scalar se_beta1 = _se[idle]

. scalar sigmahat    = e(rmse)

. predict double uhat, resid

. /* Step 2 */
. reg idle

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(0, 29)        =      0.00
       Model |           0         0           .   Prob > F        =         .
    Residual |      4041.2        29  139.351724   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =    0.0000
       Total |      4041.2        29  139.351724   Root MSE        =    11.805

------------------------------------------------------------------------------
        idle |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |       84.6    2.15524    39.25   0.000     80.19204    89.00796
------------------------------------------------------------------------------

. predict double rhat, resid

. gen double ahat = uhat*rhat

. /* Step 3 */
. gen double v = ahat^2

. replace v    = v + ahat*L1.ahat in 2/L
(29 real changes made)

. sum v

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           v |         30    3070.853    8464.421  -256.7766   33901.46

. scalar v1     = r(sum) 

. scalar v1_fsc = r(sum)*(30/28) // Stata uses a finite sample correction of T/(T-k)

. /* Step 4 */
. di "Usual N-W SE = " sqrt(scalar(v1))*[scalar(se_beta1)/scalar(sigmahat)]^2
Usual N-W SE = .07510689

. di "Stata's N-W SE  = " sqrt(scalar(v1_fsc))*[scalar(se_beta1)/scalar(sigmahat)]^2
Stata's N-W SE  = .07774301

. /* Comapare to newey command */
. newey usr idle, lag(1)

Regression with Newey-West standard errors      Number of obs     =         30
maximum lag: 1                                  F(  1,        28) =       8.61
                                                Prob > F          =     0.0066

------------------------------------------------------------------------------
             |             Newey-West
         usr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        idle |  -.2281501    .077743    -2.93   0.007    -.3873994   -.0689007
       _cons |   23.13483   7.119611     3.25   0.003     8.550965    37.71869
------------------------------------------------------------------------------

. di _se[idle]
.07774301

As you can see, with the finite sample correction, newey matches what we did by hand at 0.07774301. I am not sure this example contains a whole lot of intuition, but YMMV.

This is based on Wooldridge, Jeffrey M. "A computationally simple heteroskedasticity and serial correlation robust standard error for the linear regression model." Economics Letters 31.3 (1989): 239-243.

Stata Code:

/* N-W Standard Errors With One Regressor and Serial Correlation That Dies Down After 1 Period */
webuse idle2, clear
tsset time
list time usr idle, clean noobs
/* Step 1 */
reg usr idle
scalar se_beta1 = _se[idle]
scalar sigmahat = e(rmse)
predict double uhat, resid
/* Step 2 */
reg idle
predict double rhat, resid
gen double ahat = uhat*rhat
/* Step 3 */
gen double v = ahat^2
replace v    = v + ahat*L1.ahat in 2/L
sum v
scalar v1     = r(sum) 
scalar v1_fsc = r(sum)*(30/28) // Stata uses a finite sample correction of T/(T-k)
/* Step 4 */
di "Usual N-W SE = " sqrt(scalar(v1))*[scalar(se_beta1)/scalar(sigmahat)]^2
di "Stata's N-W SE  = " sqrt(scalar(v1_fsc))*[scalar(se_beta1)/scalar(sigmahat)]^2
/* Compare to newey command */
newey usr idle, lag(1)
di _se[idle]
/* Compare To Smaller Robust Sandwich SE */ 
sum v
scalar v0     = r(sum)*(30/28)
di "Stata's Sandwich SE  = " sqrt(scalar(v0))*[scalar(se_beta1)/scalar(sigmahat)]^2
reg usr idle, robust
di _se[idle]

Best Answer

Related Solutions

Solved – Newey-West t-statistics

Solved – Formula for Newey West Standard Error

Related Question