Regression – Comparing Newey-West (1987) and Hansen-Hodrick (1980) Methods for Robust Standard Errors

autocorrelationheteroscedasticityneweywestregressionrobust-standard-error

Question: What are the main differences and similarities between using Newey-West (1987) and Hansen-Hodrick (1980) standard errors? In which situations should one of these be preferred over the other?

Notes:

  • I do know how each of these adjustment procedures works; however, I have not yet found any document that would compare them, either online or in my textbook. References are welcome!
  • Newey-West tends to be used as "catch-all" HAC standard errors, whereas Hansen-Hodrick comes up frequently in the context of overlapping data points (e.g. see this question or this question). Hence one important aspect of my question is, is there anything about Hansen-Hodrick that makes it more suited to deal with overlapping data than Newey-West? (After all, overlapping data ultimately leads to serially correlated error terms, which Newey-West also deals with.)
  • For the record, I am aware of this similar question, but it was relatively poorly posed, got downvoted and ultimately the question that I am asking here did not get answered (only the programming-related part got answered).

Best Answer

Consider a class of long-run variance estimators

$$ \hat{J_T}\equiv\hat{\gamma}_0+2\sum_{j=1}^{T-1}k\left(\frac{j}{\ell_T}\right)\hat{\gamma}_j $$ $k$ is a kernel or weighting function, the $\hat\gamma_j$ are sample autocovariances. $k$, among other things must be symmetric and have $k(0)=1$. $\ell_T$ is a bandwidth parameter.

Newey & West (Econometrica 1987) propose the Bartlett kernel $$k\left(\frac{j}{\ell_T}\right) = \begin{cases} \bigl(1 - \frac{j}{\ell_T}\bigr) \qquad &\mbox{for} \qquad 0 \leqslant j \leqslant \ell_T-1 \\ 0 &\mbox{for} \qquad j > \ell_T-1 \end{cases} $$

Hansen & Hodrick's (Journal of Political Economy 1980) estimator amounts to taking a truncated kernal, i.e., $k=1$ for $j\leq M$ for some $M$, and $k=0$ otherwise. This estimator is, as discussed by Newey & West, consistent, but not guaranteed to be positive semi-definite (when estimating matrices), while Newey & West's kernel estimator is.

Try $M=1$ for an MA(1)-process with a strongly negative coefficient $\theta$. The population quantity is known to be $J = \sigma^2(1 + \theta)^2>0$, but the Hansen-Hodrick estimator may not be:

set.seed(2)
y <- arima.sim(model = list(ma = -0.95), n = 10)
acf.MA1 <- acf(y, type = "covariance", plot = FALSE)$acf
acf.MA1[1] + 2 * acf.MA1[2]
## [1] -0.4056092

which is not a convincing estimate for a long-run variance.

This would be avoided with the Newey-West estimator:

acf.MA1[1] + acf.MA1[2]
## [1] 0.8634806

Using the sandwich package this can also be computed as:

library("sandwich")
m <- lm(y ~ 1)
kernHAC(m, kernel = "Bartlett", bw = 2,
  prewhite = FALSE, adjust = FALSE, sandwich = FALSE)
##             (Intercept)
## (Intercept)   0.8634806

And the Hansen-Hodrick estimate can be obtained as:

kernHAC(m, kernel = "Truncated", bw = 1,
  prewhite = FALSE, adjust = FALSE, sandwich = FALSE)    
##             (Intercept)
## (Intercept)  -0.4056092

See also NeweyWest() and lrvar() from sandwich for convenience interfaces to obtain Newey-West estimators of linear models and long-run variances of time series, respectively.

Andrews (Econometrica 1991) provides an analysis under more general conditions.

As to your subquestion regarding overlapping data, I would not be aware of a subject-matter reason. I suspect tradition is at the roots of this common practice.