Solved – Robust option in Stata: why are the p values computed using a Student distribution

heteroscedasticitystata

The commonly used "robust" option in the regress command of Stata gives standard errors using the Huber-White sandwich estimators.

The t statistic also uses these standard errors. However I have noticed that the p value computed along uses the Student distribution, exactly like in the "regular" OLS regression with the normality assumption.

I thought that the release of the homoscedasticity assumption implied that we had to rely on asymptotics properties of the OLS estimator, and as a consequence that the t statistic was to be compared to a standard normal distribution…

Did I miss something ?

Best Answer

Robust variance estimators require large samples to be valid. In small samples, they are biased downward, and the normal-distribution-based confidence intervals may have coverage way below nominal coverage rates.

Using a $t_{n-k}$-distribution approximations to be conservative is one possible solution: you hope that this fattens up the tails adequately before you offer up your tests to the journal referee gods. Other ideas are multiplying the squared residuals by $\frac{n}{n-k}$ (or something similar) to inflate them (which Stata also does), or higher order asymptotic expansions, or resampling methods like bootstrapping. Here $n$ is the number of observations and $k$ the number of parameters.

A nice survey of this literature is Imbens and Kolesar (2012). They give a great example where the $t$ approximation goes wrong in a setting where you have a binary treatment with very few treated observations. Using $n=n_T+n_C$ is far too generous.

If your sample size is large, using the $t$ versus the normal won't matter at all.

Related Question