Solved – Calculate Newey-West standard errors without an lm object in R

autocorrelationheteroscedasticityrstandard error

I asked this question yesterday on StackOverflow, and got an answer, but we agreed that it seems a bit hackish and there may be a better way to look at it.

The question: I would like calculate the Newey-West (HAC) standard errors for a vector (in this case a vector of stock returns). The function NeweyWest() in the sandwich package does this, but takes an lm object as an input. The solution Joris Meys offered is to project the vector onto 1, which turns my vector into residuals to feed into NeweyWest(). That is:

as.numeric(NeweyWest(lm(rnorm(100) ~ 1)))

for the variance of the mean.

Should I be doing it like this? Or is there a way to more directly do what I want? Thanks!

Best Answer

Suppose we have a regression

\begin{align*} y=X\beta+u \end{align*}

Then OLS estimate $\hat{\beta}$ is \begin{align*} \widehat{\beta}-\beta=(X'X)^{-1}X'u \end{align*} and assuming that $\hat{\beta}$ is unbiased estimate we have \begin{align*} Var(\widehat{\beta})=E\left[(X'X)^{-1}X'uu'X(X'X)^{-1}\right] \end{align*}

The usual OLS assumptions are that $E(u|X)=0$ and $E(uu'|X)=\sigma^2I_n$ which gives us \begin{align*} Var(\widehat{\beta})=\sigma^2E(X'X)^{-1} \end{align*} This covariance matrix is usually reported in statistical packages.

If $u_i$ are heteroscedastic and (or) autocorellated, then $E(uu'|X)\neq\sigma^2I_n$ and the usual output gives misleading results. To get the correct results HAC standard errors are calculated. All the methods for HAC errors calculate \begin{align*} diag(E(X'X)^{-1}X'uu'X(X'X)^{-1}). \end{align*} They differ on their assumptions what $E(uu'|X)$ looks like.

So it is natural then that function NeweyWest requests linear model. Newey-West method calculates the correct standard errors of linear model estimator. So your solution is perfectly correct if you assume that your stock returns follow the model \begin{align} r_t=\mu+u_t \end{align} and you want to estimate $Var(\mu)$ guarding against irregularities in $u_t$.

If on the other hand you want to estimate "correct" $Var(r_t)$ (whatever that means), you should check out volatility models, such as GARCH and its variants. They assume that \begin{align*} r_t=\sigma_t\varepsilon_t \end{align*} where $\varepsilon_t$ are iid normal. The goal is then to correctly estimate $\sigma_t$. Then $Var(r_t)=Var(\sigma_t)$ and you have "correct" estimate of your variance, guarding against usual idiosyncrasies of stock returns such as volatility clustering, skewness and etc.