Solved – Quantile regression versus OLS with dummies

econometricspanel dataquantile regression

I want to regress a variable Y on another variable X (with appropriate control variables and fixed effects) in a panel data setting. Two approaches come to mind:

  1. Use quantile regression;

  2. Use OLS regression to regress Y on the quartiles of X by using interaction terms, that is, multiplying X by an indicator variable that takes value 1 if the observation belongs to a certain quartile. So basically we would have y = intercept + D0.5*X + D0.75*X + D1.0*X + controls, where D0.5 is the indicator variable for the second quartile, D0.75 is the indicator variable for the third quartile, and so on.

What is the difference between the two approaches and in which cases would one be more appropriate than the other?


To answer the comments:

  1. I am trying to see how X impacts Y for given quartiles of X. I expect that the impact of X on Y varies significantly across the quartiles of X. This is basically the hypothesis.

  2. The observations are country-years. I expect X to only have an important impact on Y for high values of X (and for another variable, say X',I expect the opposite to hold). The idea is to check this hypothesis. What would you recommend?

  3. Maybe it helps if I am more specific. X is an input factor in a country-year (hence panel data specification) and X' is another input factor. One theory suggests that X and X' should both have a statistically significant and positive impact on Y (dependent variable). Another suggests that X should have negative impact (becoming more negative for larger values of Y) and that X' should have a positive impact and larger as Y increases. The idea is to see how both these variables affect Y along the quantiles of Y and to test both theories. Both theories support that the direction of causality is from X to Y and not from Y to X.

Best Answer

Neither method is appropriate because what you want to do is not appropriate.

Quantile regression is about estimating quantiles of the dependent variable - that is, it looks at quantiles instead of the mean.

Using dummy variables for different quartiles of an independent variable is binning. Given that you think the effect of X on Y will be different at different levels of X, you have several options. You can use splines of various kinds. If you know the exact point at which you think the effect of X on Y changes, you can implement a hockey stick model; I think it more likely that you would want to estimate where and how that change occurs. So, restricted cubic splines might be good.

However, if you have a bunch of independent variables that might interact, then MARS (multivariate adaptive regression splines) could be a good method.

Related Question