Solved – Quantile regression vs. Li’s regression: which should I use, and when

outliersquantile regressionregressionrobust

Is there a general rule of thumb about when robust regression or quantile regression is preferred in the presence of outliers?

For example, I have a dataset where the DV exhibits extreme positive skewness. However, the large cases are actually some of the most interesting observations. When I run OLS, I find a positive relation between the DV and the IV of interest. When I estimate quantile regressions, I find that the positive relation between the DV and IV of interest is strongest in the 85th, 90th, and 95th percentiles (which is where one might expect). It is insignificant and sometimes negative for the rest of the percentiles. However, when I run rreg in Stata, it basically gives no weight to the large positive outliers, leading to no relation between the DV and the IV of interest. Which approach (OLS, Quantile, rreg) should be reported? Which is most appropriate?

Best Answer

Is there a general rule of thumb about when robust regression or quantile regression is preferred in the presence of outliers?

Yes. So long as we're comparing regression equivariant approaches, it is clearly possible to rank the various robust estimates of regression in terms of their capacity to find outliers.

The algorithm behind rreg is described here:

rreg first performs an initial screening based on Cook’s distance $>1$ to eliminate gross outliers before calculating starting values and then performs Huber iterations followed by biweight iterations, as suggested by Li

The Li estimate of regression is in a sense similar to an S-estimator but with a single starting point. This estimator is not used a lot and has not been studied much. I would advise you to use the FastS algorithm of Saliban-Barrera&Yohai, about which much more is known.

For more background on why the S-estimator, a robust estimator with re-descending $\rho$ function, is more reliable than quantile regression check this answer. The S-estimates of regression are implemented in Stata, check the Verardi and Croux (2008) stata package and companion paper.

For the second part of your question: the breakdown point of quantile regression is proportional to the quantile you estimate with it. So the $\tau=0.9$ quantile of the quantile regression is much less able to withstand outliers than the $\tau=0.5$ quantile (and is generally not considered robust).

By the way, the fact that an observation is flagged as an outlier does not imply anything about the quality, validity or reliability of the corresponding measurement. It simply means that the flagged observation is inconsistent with the multivariate pattern fitting the bulk of the data. Indeed, in many fields (micro-array analysis, fraud identification) revealing such data points is often the primary objective of the study.

[1]Verardi, Croux (2008). Robust regression in Stata. The Stata Journal 9(3): 439-453.
[2]Salibian-Barrera M., Yohai, V.J. (2006). A Fast Algorithm for S-Regression Estimates. Journal of Computational and Graphical Statistics, Vol. 15, 414--427.

Related Question