Solved – Are regressions with student-t errors useless

mathematical-statisticsmodelingregressionrobust

Please see edit.

When you have data with heavy tails, doing a regression with student-t errors seems like an intuitive thing to do. While exploring this possibility, I ran into this paper:

Breusch, T. S., Robertson, J. C., & Welsh, A. H. (November 01, 1997). The emperor's new clothes: a critique of the multivariate t regression model. Statistica Neerlandica, 51, 3.) (link, pdf)

Which argues that the scale parameter and the degrees of freedom parameter are not identifiable with respect to each other in some sense and that because of this doing a regression with t errors doesn't do anything beyond what a standard linear regression does.

Zellner (1976) proposed a regression model in which the data vector
(or the error vector) is represented as a realization from the
multivariate Student t distribution. This model has attracted
considerable attention because it seems to broaden the usual Gaussian
assumption to allow for heavier-tailed error distributions. A number
of results in the literature indicate that the standard inference
procedures for the Gaussian model remain appropriate under the broader
distributional assumption, leading to claims of robustness of the
standard methods. We show that, although mathematically the two models
are different, for purposes of statistical inference they are
indistinguishable. The empirical implications of the multivariate t
model are precisely the same as those of the Gaussian model. Hence the
suggestion of a broader distributional representation of the data is
spurious, and the claims of robustness are misleading. These
conclusions are reached from both frequentist and Bayesian
perspectives.

This surprises me.

I don't have the mathematical sophistication to evaluate their arguments well, so I have a couple of questions: Is it true that doing regressions with t-errors is not generally useful? If they are sometimes useful, have I missunderstood the paper or is it misleading? If they are not useful, is this a well known fact? Are there other ways to account for data with heavy tails?

Edit: Upon closer reading, of paragraph 3 and section 4, it looks like the paper below is not talking about what I was thinking of as a student-t regression (errors are independent univariate t distribuions). The errors are instead drawn from a single distribution and are not independent. If I understand correctly, this lack of independence is precisely what explains why you cannot estimate the scale and degrees of freedom independently.

I guess this paper provides a list of papers to avoid reading.

Best Answer

Your edit is correct. The results presented in the paper apply only to multivariate-t errors. If you are using independent t errors, then you are safe.

I do not think the paper is well known, but I think it is correct.

The statistical literature is full of "generalizations" which in many cases are either reparameterizations, one-to-one transformations or sometimes useless because they do not contribute significantly in generalizing some properties of the model in question.

Related Solutions

Solved – Simultaneous heteroscedasticity and heavy tails in a regression model

Heteroscedasticity and leptokurtosis are easily conflated in data analysis. Take a data model which generates an error term as Cauchy. This meets the criteria for homoscedasticty. The Cauchy distribution has infinite variance. A Cauchy error is a simulator's way of including an outlier-sampling process.

With these heavy tailed errors, even when you fit the correct mean model, the outlier leads to a large residual. A test of heteroscedasticity has greatly inflated type I error under this model. A Cauchy distribution also has a scale parameter. Generating error terms with a linear increase in scale produces heteroscedastic data, but the power to detect such effects is practically null so the type II error is inflated as well.

Let me suggest then, the proper data analytic approach isn't to become mired in tests. Statistical tests are primarily misleading. No where is this more obvious than tests intended to verify secondary modeling assumptions. They are no substitution for common sense. For your data, you can plainly see two large residuals. Their effect on the trend is minimal as few if any residuals are offset in a linear departure from the 0 line in the plot of residuals vs. fitted. That is all you need to know.

What is desired then is a means of estimating a flexible variance model that will allow you to create prediction intervals over a range of fitted responses. Interestingly, this approach is capable of handling most sane forms of both heteroscedasticity and kurtotis. Why not then use a smoothing spline approach to estimating the mean squared error.

Take the following example:

set.seed(123)
x <- sort(rexp(100))
y <- rcauchy(100, 10*x)

f <- lm(y ~ x)
abline(f, col='red')
p <- predict(f)
r <- residuals(f)^2

s <- smooth.spline(x=p, y=r)

phi <- p + 1.96*sqrt(s$y)
plo <- p - 1.96*sqrt(s$y)

par(mfrow=c(2,1))
plot(p, r, xlab='Fitted', ylab='Squared-residuals')
lines(s, col='red')
legend('topleft', lty=1, col='red', "predicted variance")

plot(x,y, ylim=range(c(plo, phi), na.rm=T))
abline(f, col='red')
lines(x, plo, col='red', lty=2)
lines(x, phi, col='red', lty=2)

Gives the following prediction interval that "widens up" to accommodate the outlier. It is still a consistent estimator of the variance and usefully tells people, "Hey there's this big, wonky observation around X=4 and we can't predict values very usefully there."

Solved – When to use robust standard errors in Poisson regression

In general if you have any suspicion that your errors are heteroskedastic, you should use robust standard errors. The fact that your estimates become non-significant when you don't use robust SEs suggests (but does not prove) the need for robust SEs! These SEs are "robust" to the bias that heteroskedasticity can cause in a generalized linear model.

This situation is a little different, though, in that you're layering them on top of Poisson regression.

Poisson has a well known property that it forces the dispersion to be equal to the mean, whether or not the data supports that. Before considering robust standard errors, I would try a Negative Binomial regression, which does not suffer from this problem. There is a test (see the comment) to help determine whether the resultant change in standard errors is significant.

I do not know for sure whether the change you're seeing (moving to robust SEs narrows the CI) implies under-dispersion, but it seems likely. Take a look at the appropriate model (I think negative binomial, but a quick googling also suggests quasi-Poisson for under-dispersion?) and see what you get in that setting.

Best Answer

Related Solutions

Solved – Simultaneous heteroscedasticity and heavy tails in a regression model

Solved – When to use robust standard errors in Poisson regression

Related Question