Suppose we have a regression
\begin{align*}
y=X\beta+u
\end{align*}
Then OLS estimate $\hat{\beta}$ is
\begin{align*}
\widehat{\beta}-\beta=(X'X)^{-1}X'u
\end{align*}
and assuming that $\hat{\beta}$ is unbiased estimate we have
\begin{align*}
Var(\widehat{\beta})=E\left[(X'X)^{-1}X'uu'X(X'X)^{-1}\right]
\end{align*}
The usual OLS assumptions are that $E(u|X)=0$ and $E(uu'|X)=\sigma^2I_n$ which gives us
\begin{align*}
Var(\widehat{\beta})=\sigma^2E(X'X)^{-1}
\end{align*}
This covariance matrix is usually reported in statistical packages.
If $u_i$ are heteroscedastic and (or) autocorellated, then $E(uu'|X)\neq\sigma^2I_n$ and the usual output gives misleading results. To get the correct results HAC standard errors are calculated. All the methods for HAC errors calculate
\begin{align*}
diag(E(X'X)^{-1}X'uu'X(X'X)^{-1}).
\end{align*}
They differ on their assumptions what $E(uu'|X)$ looks like.
So it is natural then that function NeweyWest
requests linear model. Newey-West method calculates the correct standard errors of linear model estimator. So your solution is perfectly correct if you assume that your stock returns follow the model
\begin{align}
r_t=\mu+u_t
\end{align}
and you want to estimate $Var(\mu)$ guarding against irregularities in $u_t$.
If on the other hand you want to estimate "correct" $Var(r_t)$ (whatever that means), you should check out volatility models, such as GARCH and its variants. They assume that
\begin{align*}
r_t=\sigma_t\varepsilon_t
\end{align*}
where $\varepsilon_t$ are iid normal. The goal is then to correctly estimate $\sigma_t$. Then $Var(r_t)=Var(\sigma_t)$ and you have "correct" estimate of your variance, guarding against usual idiosyncrasies of stock returns such as volatility clustering, skewness and etc.
Seeing as how I had a similar question earlier and came across this long-unanswered question through a simple web search, I'll take a stab and post what I think is one possible solution to your situation that others may also be encountering.
According to SAS Support, you can take the time-series you have and fit an intercept-only regression model to the series. The estimated intercept for this regression model will be the sample mean of the series. You can then pass this intercept-only regression model through the SAS commands used to retrieve Newey-West standard errors of a regression model.
Here is the link to the SAS Support page:
http://support.sas.com/kb/40/098.html
Look for "Example 2. Newey-West standard error correction for the sample mean of a series"
In your case, simply try the same approach with Matlab.If someone has a better approach, please enlighten us.
Best Answer
Consider a class of long-run variance estimators
$$ \hat{J_T}\equiv\hat{\gamma}_0+2\sum_{j=1}^{T-1}k\left(\frac{j}{\ell_T}\right)\hat{\gamma}_j $$ $k$ is a kernel or weighting function, the $\hat\gamma_j$ are sample autocovariances. $k$, among other things must be symmetric and have $k(0)=1$. $\ell_T$ is a bandwidth parameter.
Newey & West (Econometrica 1987) propose the Bartlett kernel $$k\left(\frac{j}{\ell_T}\right) = \begin{cases} \bigl(1 - \frac{j}{\ell_T}\bigr) \qquad &\mbox{for} \qquad 0 \leqslant j \leqslant \ell_T-1 \\ 0 &\mbox{for} \qquad j > \ell_T-1 \end{cases} $$
Hansen & Hodrick's (Journal of Political Economy 1980) estimator amounts to taking a truncated kernal, i.e., $k=1$ for $j\leq M$ for some $M$, and $k=0$ otherwise. This estimator is, as discussed by Newey & West, consistent, but not guaranteed to be positive semi-definite (when estimating matrices), while Newey & West's kernel estimator is.
Try $M=1$ for an MA(1)-process with a strongly negative coefficient $\theta$. The population quantity is known to be $J = \sigma^2(1 + \theta)^2>0$, but the Hansen-Hodrick estimator may not be:
which is not a convincing estimate for a long-run variance.
This would be avoided with the Newey-West estimator:
Using the
sandwich
package this can also be computed as:And the Hansen-Hodrick estimate can be obtained as:
See also
NeweyWest()
andlrvar()
fromsandwich
for convenience interfaces to obtain Newey-West estimators of linear models and long-run variances of time series, respectively.Andrews (Econometrica 1991) provides an analysis under more general conditions.
As to your subquestion regarding overlapping data, I would not be aware of a subject-matter reason. I suspect tradition is at the roots of this common practice.