Solved – Robust OLS versus ML with sandwich estimator

least squaresmaximum likelihoodrobustsandwichstandard error

If you compare the standard errors of the OLS coefficients with the White correction, versus the ML estimates with the variance estimated with the sandwich estimator, which standard errors do you expect to be the biggest? And is it correct that both are robust to heteroscedasticity?

Best Answer

In Stata's User Guide it is stated that Stata uses the "White" formula for the heteroskedasticity-robust variance-covariance matrix of the estimator. Then, at least in the context of the Normal Linear Regression Model $$y_i = \mathbf x_i'\beta +u_i$$ we should obtain the exact same results using either OLS or ML.

With observations and errors i.i.d. (hence homoskedastic also), and all the other nice assumptions, the (full) asymptotic variance-covariance matrix of the ML estimator is

$$\operatorname {Avar}\left[\sqrt n(\hat \beta_{ML}-\beta)\right] = \Big[E(H_i)\Big]^{-1}E(s_is_i')\Big[E(H_i)\Big]^{-1}$$

Where $H_i$ is the Hessian matrix (2nd derivative of log-likelihood) for the $i$-th observation,

$$H_i = -\frac1{\sigma^2}\mathbf x_i\mathbf x_i'$$

and $s_i$ is the score vector (1st derivative of the loglikelihood) for the $i$-th observation,

$$s_i = \frac1{\sigma^2}u_i\mathbf x_i$$ Under homoskedasticity, the information matrix equality holds, $E(s_is_i')=-E(H_i)$ and so the expression simplifies to

$$\operatorname {Avar}\left[\sqrt n(\hat \beta_{ML}-\beta)\right] = -\Big[E(H_i)\Big]^{-1} = \Big[E(s_is_i')\Big]^{-1}$$

two equalities that provide the two alternative estimator variance formulas we use in ML estimation (using sample means instead of expected values). Note that in finite samples these two estimates do not give identical results -but for large samples, they should be "close".

But let's say we suspect the existence of heteroskedasticity and we want to account for it, not by attempting to specify a functional form for it (as is the older approach) but by calculating only a "heteroskedasticity-robust" variance-covariance matrix for our tests and inference, obtaining the coefficient estimates themselves in the usual way.

In such a case, and in order to by-pass the problem of estimating $n$ different variances, we use the result that, at least for some forms of misspecification (heteroskedasticity included), specifying the wrong log-likelihood (in our case, ignoring the heteroskedasticity), nevertheless still gives us a consistent ML estimator, the "Quasi-MLE", whose asymptotic variance-covariance matrix is consistently estimated by the full version of the theoretical asymptotic variance of the misspecified model, (i.e. we ignore the "information matrix equality" that previously simplified matters). In other words,

$$\operatorname {\hat Avar}_{Robust}\left[\sqrt n(\hat \beta_{ML}-\beta)\right] \\= \Big[\frac 1n\sum_{i=1}^n\hat H_i\Big]^{-1}\left(\frac 1n\sum_{i=1}^n(\hat s_i \hat s_i')\right)\Big[\frac 1n\sum_{i=1}^n\hat H_i\Big]^{-1}$$

$$=n\cdot \Big[\frac 1{\hat \sigma^2}\sum_{i=1}^n\mathbf x_i\mathbf x_i'\Big]^{-1}\left(\frac 1{(\hat \sigma^2)^2}\sum_{i=1}^n\hat u_i^2 \mathbf x_i\mathbf x_i'\right)\Big[\frac 1{\hat \sigma^2}\sum_{i=1}^n\mathbf x_i\mathbf x_i'\Big]^{-1}$$

$$=n\cdot \left(\mathbf X'\mathbf X\right)^{-1}\left(\sum_{i=1}^n\hat u_i^2 \mathbf x_i\mathbf x_i'\right)\left(\mathbf X'\mathbf X\right)^{-1}$$

Since in the Normal linear regression model, the ML estimator coincides with the OLS estimator for the coefficients, the residual series will be identical, so the above expression is numerically equal to the heteroskedasticity-robust variance covariance matrix of the (centered and scaled) OLS estimator.

But the OP says in a comment that he obtains different results for the two cases with his software. This may be due to some "finite-sample" bias correction that creeps in in the one case but not in the other, which nevertheless is not part of the original "White" expression, but was proposed later by White himself but also by Davidson and MacKinnon based on results from Monte-Carlo simulations. Alternatively, it may be the case, that the actual code written for reflecting the above equation for the ML estimation, instead of ignoring the variance estimate, it calculates $\hat \sigma^4$ directly and doesn't just square the estimated variance, using a formula that might produce different results than squared estimated variance, and so the variances do not really cancel out (note that in the OLS framework, the variances are absent from the beginning).

Related Question