Consider a class of long-run variance estimators
$$
\hat{J_T}\equiv\hat{\gamma}_0+2\sum_{j=1}^{T-1}k\left(\frac{j}{\ell_T}\right)\hat{\gamma}_j
$$
$k$ is a kernel or weighting function, the $\hat\gamma_j$ are sample autocovariances. $k$, among other things must be symmetric and have $k(0)=1$. $\ell_T$ is a bandwidth parameter.
Newey & West (Econometrica 1987) propose the Bartlett kernel
$$k\left(\frac{j}{\ell_T}\right) = \begin{cases}
\bigl(1 - \frac{j}{\ell_T}\bigr)
\qquad &\mbox{for} \qquad 0 \leqslant j \leqslant \ell_T-1 \\
0 &\mbox{for} \qquad j > \ell_T-1
\end{cases}
$$
Hansen & Hodrick's (Journal of Political Economy 1980) estimator amounts to taking a truncated kernal, i.e., $k=1$ for $j\leq M$ for some $M$, and $k=0$ otherwise. This estimator is, as discussed by Newey & West, consistent, but not guaranteed to be positive semi-definite (when estimating matrices), while Newey & West's kernel estimator is.
Try $M=1$ for an MA(1)-process with a strongly negative coefficient $\theta$. The population quantity is known to be $J = \sigma^2(1 + \theta)^2>0$, but the Hansen-Hodrick estimator may not be:
set.seed(2)
y <- arima.sim(model = list(ma = -0.95), n = 10)
acf.MA1 <- acf(y, type = "covariance", plot = FALSE)$acf
acf.MA1[1] + 2 * acf.MA1[2]
## [1] -0.4056092
which is not a convincing estimate for a long-run variance.
This would be avoided with the Newey-West estimator:
acf.MA1[1] + acf.MA1[2]
## [1] 0.8634806
Using the sandwich
package this can also be computed as:
library("sandwich")
m <- lm(y ~ 1)
kernHAC(m, kernel = "Bartlett", bw = 2,
prewhite = FALSE, adjust = FALSE, sandwich = FALSE)
## (Intercept)
## (Intercept) 0.8634806
And the Hansen-Hodrick estimate can be obtained as:
kernHAC(m, kernel = "Truncated", bw = 1,
prewhite = FALSE, adjust = FALSE, sandwich = FALSE)
## (Intercept)
## (Intercept) -0.4056092
See also NeweyWest()
and lrvar()
from sandwich
for convenience interfaces to obtain Newey-West estimators of linear models and long-run variances of time series, respectively.
Andrews (Econometrica 1991) provides an analysis under more general conditions.
As to your subquestion regarding overlapping data, I would not be aware of a subject-matter reason. I suspect tradition is at the roots of this common practice.
Generalized least squares jointly models fixed effects and a covariance structure in data to yield Gauss-Markov optimal estimators of the fixed effects. We can also refer to the fixed effects as the "trend", borrowing heavily from semiparametric inference and theory.
If the variance structure in a GLS model is misspecified, the estimate is an unbiased, but inefficient estimate of the trend, just like with OLS and independent data. The sandwich estimator, also called the heteroscedasticity-consistent estimator, provides asymptotically correct 95% confidence intervals even when the model is misspecified, so tests of trend are of the correct 0.05 level and achieve relatively good power.
The GLS model solves the estimating equation:
$$0 = \sum_{i=1}^n S_i (\hat{\beta}) = \mathbf{X}^T W^{-1} \left( Y - \mathbf{X}^T\hat{\beta} \right)$$
Where $W$ is the inverse covariance matrix estimated from a number of possible ways. As an interesting note, most GEE estimators use method-of-moments estimators for covariance parameters, but I believe the GLS which is an exact likelihood based procedure uses the EM algorithm. I do not know if this is a problem with my proposed solution here.
The robust sandwich variance estimator of $\hat{\beta}$ is given by:
$$\text{var}(\hat{\beta}) = \frac{1}{n}\mathbf{A}^{-1}\mathbf{B}\mathbf{A}^{-1}$$
Where $A = \dfrac{\partial S(\hat{\beta})}{\partial \hat{\beta}}$ and $B = S(\hat{\beta})S(\hat{\beta})^T/n$. Putting it all together, we should be able to construct a sandwich estimate from a GLS model by hand fairly easily.
Using the Ovary
example from GLS, the model fm1
has a non-zero unweighted sum of residuals (sum(residuals(fm1))
= 1.65). Here is an example of checking the weighted score to verify that, indeed, it sums to zero thus solves the estimating equation
X <- model.matrix(fm1, data=Ovary)
Mares <- Ovary$Mare
W <- lapply(lapply(levels(Mares), getVarCov, obj=fm1), solve)
Xsplit <- lapply(split(as.data.frame(X), Mares), as.matrix)
Resids <- split(residuals(fm1), Mares)
Scores <- mapply(function(x, w, r) {
t(x)%*%w%*%r}, Xsplit, W, Resids)
rowSums(Scores)
gives
> rowSums(Scores)
[1] 1.318667e-13 -1.706968e-14 4.593981e-14
okay, this forms the basis of generating the model based variance-covariance matrix for the trend.
xwx <- Reduce(`+`, mapply(function(x,w) t(x)%*%w%*%x, Xsplit, W, SIMPLIFY=F))
solve(xwx)
vcov(fm1)
> solve(xwx)
(Intercept) sin(2 * pi * Time) cos(2 * pi * Time)
(Intercept) 4.417512e-01 -8.395204e-08 -1.363831e-01
sin(2 * pi * Time) -8.395204e-08 4.160866e-01 7.475767e-08
cos(2 * pi * Time) -1.363831e-01 7.475767e-08 4.865597e-01
> vcov(fm1)
(Intercept) sin(2 * pi * Time) cos(2 * pi * Time)
(Intercept) 4.417512e-01 -8.395204e-08 -1.363831e-01
sin(2 * pi * Time) -8.395204e-08 4.160866e-01 7.475767e-08
cos(2 * pi * Time) -1.363831e-01 7.475767e-08 4.865597e-01
Now the sandwich is obtained by using the residuals as plug-in estimators and obtaining the output:
Amat <- xwx
Bmat <- mapply(function(x, w, r) t(x)%*%w%*%diag(c(r)^2)%*%w%*%x, Xsplit, W, Resids, SIMPLIFY=F)
Bmat <- Reduce(`+`, Bmat)
SandCov <- solve(Amat) %*% Bmat %*% solve(Amat)
And these robust covariance estimates can be used in place of the model-based estimates for heteroscedasticity robust, consistent inference.
Best Answer
statsmodels uses by default Newey-West corrected standard errors with the usual Bartlett window.
There is a 'weights_func' or 'kernel' option to choose a different window than Bartlett, eg. uniform.
However, statsmodels has no other options for HAC robust standard errors like pre-whitening or automatic lag selection, or autocorrelation robust standard errors without heteroscedasticity robustness (i.e. only 'HAC', but no
AC
).If I remember correctly Hansen-Hodrick is 'AC' with uniform kernel.