Solved – Robust regression – a better understanding

heteroscedasticityregressionregression coefficientsrobust

I looked at robust regression for the first time today and I am a bit confused, comparing it to something like ordinary least squares and I am not sure if I am on the right track.

I read a few articles and they say that with robust regression you don't need to worry too much about outliers and heteroscedasticy and the normality restrictions on the residuals is not as important as with OLS. When I do robust regression in R however, I also don't get any significant indicators regarding coefficients (I am using the rlm() function). So when robust regression like M or MM-estimation is applied in a regression model, is the significance of coefficients still important or is the idea of robust regression just to fit the best possible plane through the data points and find the coefficients?

Best Answer

To be somewhat nitpicky, I would not quite say that outliers, heteroscedasticity, and non-normality don't matter with robust regression methods. Rather, I would say that robust methods are less likely to be impaired or harmed by those conditions. However, they could still have a negative effect.

The issue of whether the significance of the coefficients or the accuracy of their estimation is what's important is really unrelated to robust regression. Which of those is more important to you depends on the questions you are trying to answer, not what tools you use to try to answer them. Instead, consider a case where you want to test the hypothesis that a given variable is unrelated to the response variable. You wouldn't want the answer you get to that question (either yes or no) to be driven by an outlier. So you would use robust methods to help ensure that your answer is representative of the bulk of your data. Likewise, consider a case where you want to know the slope of the relationship between a predictor variable and the response variable as accurately as possible. You wouldn't want the estimated slope value that you get to have been driven by an outlier. So you would use robust regression to protect against that possibility. In short, robust methods diminish the extent to which your results might be influenced by violations of the classical statistical assumptions.

I recognize your frustration that you did not get any significant results when you used these methods. There are a couple of possibilities here. It may be that what appeared to be the case prior to using robust regression (perhaps the results from a prior OLS regression analysis) were driven by violations of the OLS assumptions and the null hypothesis is actually true. The other possibility is that, when OLS assumptions do hold, standard methods will have more power than robust methods.

Related Question