t Errors vs Normal Errors – Why Use t Errors Instead of Normal Errors?

bayesiandistributionsmodelnormal distributionrobust

In this blog post by Andrew Gelman, there is the following passage:

The Bayesian models of 50 years ago seem hopelessly simple (except, of
course, for simple problems), and I expect the Bayesian models of
today will seem hopelessly simple, 50 years hence. (Just for a simple
example: we should probably be routinely using t instead of normal
errors just about everwhere, but we don’t yet do so, out of
familiarity, habit, and mathematical convenience. These may be good
reasons–in science as in politics, conservatism has many good
arguments in its favor–but I think that ultimately as we become
comfortable with more complicated models, we’ll move in that
direction.)

Why should we "routinely be using t instead of normal errors just about everywhere"?

Best Answer

Because, assuming normal errors is effectively the same as assuming that large errors do not occur! The normal distribution has so light tails, that errors outside $\pm 3$ standard deviations have very low probability, errors outside of $\pm 6$ standard deviations are effectively impossible. In practice, that assumption is seldom true. When analyzing small, tidy datasets from well designed experiments, this might not matter much, if we do a good analysis of residuals. With data of lesser quality, it might matter much more.

When using likelihood-based (or bayesian) methods, the effect of this normality (as said above, effectively this is the "no large errors"-assumption!) is to make the inference very little robust. The results of the analysis are too heavily influenced by the large errors! This must be so, since assuming "no large errors" forces our methods to interpret the large errors as small errors, and that can only happen by moving the mean value parameter to make all the errors smaller. One way to avoid that is to use so-called "robust methods", see http://web.archive.org/web/20160611192739/http://www.stats.ox.ac.uk/pub/StatMeth/Robust.pdf

But Andrew Gelman will not go for this, since robust methods are usually presented in a highly non-bayesian way. Using t-distributed errors in likelihood/bayesian models is a different way to obtain robust methods, as the $t$-distribution has heavier tails than the normal, so allows for a larger proportion of large errors. The number of degrees of freedom parameter should be fixed in advance, not estimated from the data, since such estimation will destroy the robustness properties of the method (*) (it is also a very difficult problem, the likelihood function for $\nu$, the number degrees of freedom, can be unbounded, leading to very inefficient (even inconsistent) estimators).

If, for instance, you think (are afraid) that as much as 1 in ten observations might be "large errors" (above 3 sd), then you could use a $t$-distribution with 2 degrees of freedom, increasing that number if the proportion of large errors is believed to be smaller.

I should note that what I have said above is for models with independent $t$-distributed errors. There have also been proposals of multivariate $t$-distribution (which is not independent) as error distribution. That propsal is heavily criticized in the paper "The emperor's new clothes: a critique of the multivariate $t$ regression model" by T. S. Breusch, J. C. Robertson and A. H. Welsh, in Statistica Neerlandica (1997) Vol. 51, nr. 3, pp. 269-286, where they show that the multivariate $t$ error distribution is empirically indistinguishable from the normal. But that criticism do not affect the independent $t$ model.

(*) One reference stating this is Venables & Ripley's MASS---Modern Applied Statistics with S (on page 110 in 4th edition).