Python – Choosing the Best Covariance Correction for Standard Errors: Hansen-Hodrick or Newey-West

covariancepythonrobust-standard-errorsandwichstatsmodels

I am wondering what type of covariance correction for standard errors is better: Hansen-Hodrick or Newey-West?

Also, does someone know if StatsModels package that uses "HAC" for robust covariance uses Hansen-Hodrick or Newey-West correction for standard errors?

statsmodels.regression.linear_model.RegressionResults.get_robustcov_results

Best Answer

statsmodels uses by default Newey-West corrected standard errors with the usual Bartlett window.

There is a 'weights_func' or 'kernel' option to choose a different window than Bartlett, eg. uniform.

However, statsmodels has no other options for HAC robust standard errors like pre-whitening or automatic lag selection, or autocorrelation robust standard errors without heteroscedasticity robustness (i.e. only 'HAC', but no AC).

If I remember correctly Hansen-Hodrick is 'AC' with uniform kernel.

Related Solutions

Regression – Comparing Newey-West (1987) and Hansen-Hodrick (1980) Methods for Robust Standard Errors

Consider a class of long-run variance estimators

$$ \hat{J_T}\equiv\hat{\gamma}_0+2\sum_{j=1}^{T-1}k\left(\frac{j}{\ell_T}\right)\hat{\gamma}_j $$ $k$ is a kernel or weighting function, the $\hat\gamma_j$ are sample autocovariances. $k$, among other things must be symmetric and have $k(0)=1$. $\ell_T$ is a bandwidth parameter.

Newey & West (Econometrica 1987) propose the Bartlett kernel $$k\left(\frac{j}{\ell_T}\right) = \begin{cases} \bigl(1 - \frac{j}{\ell_T}\bigr) \qquad &\mbox{for} \qquad 0 \leqslant j \leqslant \ell_T-1 \\ 0 &\mbox{for} \qquad j > \ell_T-1 \end{cases} $$

Hansen & Hodrick's (Journal of Political Economy 1980) estimator amounts to taking a truncated kernal, i.e., $k=1$ for $j\leq M$ for some $M$, and $k=0$ otherwise. This estimator is, as discussed by Newey & West, consistent, but not guaranteed to be positive semi-definite (when estimating matrices), while Newey & West's kernel estimator is.

Try $M=1$ for an MA(1)-process with a strongly negative coefficient $\theta$. The population quantity is known to be $J = \sigma^2(1 + \theta)^2>0$, but the Hansen-Hodrick estimator may not be:

set.seed(2)
y <- arima.sim(model = list(ma = -0.95), n = 10)
acf.MA1 <- acf(y, type = "covariance", plot = FALSE)$acf
acf.MA1[1] + 2 * acf.MA1[2]
## [1] -0.4056092

which is not a convincing estimate for a long-run variance.

This would be avoided with the Newey-West estimator:

acf.MA1[1] + acf.MA1[2]
## [1] 0.8634806

Using the sandwich package this can also be computed as:

library("sandwich")
m <- lm(y ~ 1)
kernHAC(m, kernel = "Bartlett", bw = 2,
  prewhite = FALSE, adjust = FALSE, sandwich = FALSE)
##             (Intercept)
## (Intercept)   0.8634806

And the Hansen-Hodrick estimate can be obtained as:

kernHAC(m, kernel = "Truncated", bw = 1,
  prewhite = FALSE, adjust = FALSE, sandwich = FALSE)    
##             (Intercept)
## (Intercept)  -0.4056092

See also NeweyWest() and lrvar() from sandwich for convenience interfaces to obtain Newey-West estimators of linear models and long-run variances of time series, respectively.

Andrews (Econometrica 1991) provides an analysis under more general conditions.

As to your subquestion regarding overlapping data, I would not be aware of a subject-matter reason. I suspect tradition is at the roots of this common practice.

Generalized Least Squares – How to Calculate Sandwich Standard Errors?

Generalized least squares jointly models fixed effects and a covariance structure in data to yield Gauss-Markov optimal estimators of the fixed effects. We can also refer to the fixed effects as the "trend", borrowing heavily from semiparametric inference and theory.

If the variance structure in a GLS model is misspecified, the estimate is an unbiased, but inefficient estimate of the trend, just like with OLS and independent data. The sandwich estimator, also called the heteroscedasticity-consistent estimator, provides asymptotically correct 95% confidence intervals even when the model is misspecified, so tests of trend are of the correct 0.05 level and achieve relatively good power.

The GLS model solves the estimating equation:

$$0 = \sum_{i=1}^n S_i (\hat{\beta}) = \mathbf{X}^T W^{-1} \left( Y - \mathbf{X}^T\hat{\beta} \right)$$

Where $W$ is the inverse covariance matrix estimated from a number of possible ways. As an interesting note, most GEE estimators use method-of-moments estimators for covariance parameters, but I believe the GLS which is an exact likelihood based procedure uses the EM algorithm. I do not know if this is a problem with my proposed solution here.

The robust sandwich variance estimator of $\hat{\beta}$ is given by:

$$\text{var}(\hat{\beta}) = \frac{1}{n}\mathbf{A}^{-1}\mathbf{B}\mathbf{A}^{-1}$$

Where $A = \dfrac{\partial S(\hat{\beta})}{\partial \hat{\beta}}$ and $B = S(\hat{\beta})S(\hat{\beta})^T/n$. Putting it all together, we should be able to construct a sandwich estimate from a GLS model by hand fairly easily.

Using the Ovary example from GLS, the model fm1 has a non-zero unweighted sum of residuals (sum(residuals(fm1)) = 1.65). Here is an example of checking the weighted score to verify that, indeed, it sums to zero thus solves the estimating equation

X <- model.matrix(fm1, data=Ovary)
Mares <- Ovary$Mare
W <- lapply(lapply(levels(Mares), getVarCov, obj=fm1), solve)
Xsplit <- lapply(split(as.data.frame(X), Mares), as.matrix)
Resids <- split(residuals(fm1), Mares)

Scores <- mapply(function(x, w, r) {
t(x)%*%w%*%r}, Xsplit, W, Resids)
rowSums(Scores)

gives

> rowSums(Scores)
[1]  1.318667e-13 -1.706968e-14  4.593981e-14

okay, this forms the basis of generating the model based variance-covariance matrix for the trend.

xwx <- Reduce(`+`, mapply(function(x,w) t(x)%*%w%*%x, Xsplit, W, SIMPLIFY=F))
solve(xwx)
vcov(fm1)

> solve(xwx)
                     (Intercept) sin(2 * pi * Time) cos(2 * pi * Time)
(Intercept)         4.417512e-01      -8.395204e-08      -1.363831e-01
sin(2 * pi * Time) -8.395204e-08       4.160866e-01       7.475767e-08
cos(2 * pi * Time) -1.363831e-01       7.475767e-08       4.865597e-01
> vcov(fm1)
                     (Intercept) sin(2 * pi * Time) cos(2 * pi * Time)
(Intercept)         4.417512e-01      -8.395204e-08      -1.363831e-01
sin(2 * pi * Time) -8.395204e-08       4.160866e-01       7.475767e-08
cos(2 * pi * Time) -1.363831e-01       7.475767e-08       4.865597e-01

Now the sandwich is obtained by using the residuals as plug-in estimators and obtaining the output:

Amat <- xwx
Bmat <- mapply(function(x, w, r) t(x)%*%w%*%diag(c(r)^2)%*%w%*%x, Xsplit, W, Resids, SIMPLIFY=F)
Bmat <- Reduce(`+`, Bmat)

SandCov <- solve(Amat) %*% Bmat %*% solve(Amat)

And these robust covariance estimates can be used in place of the model-based estimates for heteroscedasticity robust, consistent inference.