Solved – Checking for normality with robust errors

normality-assumptionrobust-standard-errorstata

I am running a linear regression (just a single IV) and have selected the robust error option (vce robust) in Stata due to heteroscedasticity (and because it is sometimes recommended to do so anyway). However, try as I might, I cannot find any advice on whether I should be testing for normality after I have selected this robust option or whether running the robust option negates the need to do so. Any help on whether normality should be tested with this option checked would be greatly appreciated.

BASED ON ANSWERS: My main focus is on understanding the regression model, so I will be looking at the (slope) coefficient and its 95% CI as well as statistical significance (I have a continuous IV). In another linear regression I had hoped to make predictions (with CI and, hopefully, PI) also. I was OK with checking the assumptions of a regression analysis until I reached the option to use robust standard errors. From the answers received am I correct in saying that asymptotic normality is needed, but not readily/easily tested for (and is rarely tested in practice)? So I could run the regression with robust errors and not test for normality. I assume that other assumptions (e.g., unusual points) still hold. I checked Stata and it does seem that it predicts when robust errors are used. Is it correct to use these predictions?

Best Answer

To make things simple, suppose you have 3 observations. Robust standard errors allow for a variance-covariance matrix of the errors to look like this: $$\Sigma = \begin{bmatrix} \sigma_{1} & 0 & 0\\ 0 & \sigma_{2} & 0 \\ 0 & 0 & \sigma_{3} \end{bmatrix} $$

The diagonal terms are the variances of the errors for each of the 3 observations. The covariance terms are all zero because we still assume that the errors are uncorrelated across observations. If you want to relax that, you will need cluster-robust errors.

Ordinary, non-robust errors assume that $\sigma_1=\sigma_2=\sigma_2=\sigma$: all observations have the same (unknown) error variance. Neither the robust nor the non-robust VCE make any assumptions about the distribution of the error, such as normality. They only make assumptions about the variance being the same (homoskedasticity) or different across observations (heteroskedasticity).

Thus, depending on what you are doing, you may still need to test for normality of the errors. Alternatively, you may be able to bootstrap or rely on asymptotics for inference.

Related Solutions

Solved – How to get ANOVA table with robust standard errors

The ANOVA in linear regression models is equivalent to the Wald test (and the likelihood ratio test) of the corresponding nested models. So when you want to conduct the corresponding test using heteroskedasticity-consistent (HC) standard errors, this cannot be obtained from a decomposition of the sums of squares but you can carry out the Wald test using a HC covariance estimate. This idea is used in both Anova() and linearHypothesis() from the car package and coeftest() and waldtest() from the lmtest package. The latter three can also be used with plm objects.

A simple (albeit not very interesting/meaningful) example is the following. We use the standard model from the ?plm manual page and want to carry out a Wald test for the significance of both log(pcap) and unemp. We need these packages:

library("plm")
library("sandwich")
library("car")
library("lmtest")

The model (under the alternative) is:

data("Produc", package = "plm")
mod <- plm(log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp,
  data = Produc, index = c("state", "year"))

First, let's look at the marginal Wald tests with HC standard errors for all individual coefficients:

coeftest(mod, vcov = vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
log(pc)    0.2920069  0.0617425  4.7294 2.681e-06 ***
log(emp)   0.7681595  0.0816652  9.4062 < 2.2e-16 ***
log(pcap) -0.0261497  0.0603262 -0.4335   0.66480    
unemp     -0.0052977  0.0024958 -2.1226   0.03411 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And then we carry out a Wald test for both log(pcap) and unemp:

linearHypothesis(mod, c("log(pcap)", "unemp"), vcov = vcovHC)

Linear hypothesis test

Hypothesis:
log(pcap) = 0
unemp = 0

Model 1: restricted model
Model 2: log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp

Note: Coefficient covariance matrix supplied.

  Res.Df Df  Chisq Pr(>Chisq)  
1    766                       
2    764  2 7.2934    0.02608 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Alternatively, we can also fit the model under the null hypothesis (mod0 say) without the two coefficients and then call waldtest():

mod0 <- plm(log(gsp) ~ log(pc) + log(emp),
  data = Produc, index = c("state", "year"))
waldtest(mod0, mod, vcov = vcovHC)

Wald test

Model 1: log(gsp) ~ log(pc) + log(emp)
Model 2: log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp
  Res.Df Df  Chisq Pr(>Chisq)  
1    766                       
2    764  2 7.2934    0.02608 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The test statistic and p-value computed by linearHypothesis() and waldtest() is exactly the same. Just the interface and output formatting is somewhat different. In some cases one or the other is simpler or more intuitive.

Note: If you supply a covariance matrix estimate (i.e., a matrix like vocvHC(mod)) instead of a covariance matrix estimator (i.e., a function like vocvHC), make sure that you supply the HC covariance matrix estimate of the model under the alternative, i.e., the non-restricted model.

Solved – R: Confused about robust standard errors using “felm” and “huxreg”

The result from huxreg comes directly from the coeftest object:

> fe1_ro1

t test of coefficients:

              Estimate Std. Error t value Pr(>|t|)
cylinders    -0.296565   4.593215 -0.0646   0.9486
displacement -0.045153   0.078539 -0.5749   0.5657

So the question is why coeftest is giving one result, and summary.felm another. The felm documentation doesn't say how exactly it calculates robust SEs; you might need to check the code or contact the author. I wouldn't assume that summary.felm is correct and coeftest is wrong (even if summary.felm gives significant results ;-) ). They may both be "right" but using different methods.

huxreg uses the tidy method from the broom package to get standard errors. felm has such a method but I don't think that summary.felm does. You can get round this by using the tidy_override function in recent huxtable.
You might look at biglm or speedglm for increasing OLS speed, though I don't think they specifically deal with LSDV issues. Another option would be plm, designed for panel data but perhaps useful for you.

Best Answer

Related Solutions

Solved – How to get ANOVA table with robust standard errors

Solved – R: Confused about robust standard errors using “felm” and “huxreg”

Related Question