Solved – Understanding and interpreting quantile regression

interpretationquantile regression

I am trying to better understand what quantile regressions are and how we can interpret them. I know that quantile regressions are used to model a specific conditional quantile $\tau$ of a response variable $y$ given some covariates $X$ as opposed to modeling the conditional mean of $y$ in an OLS setting. Now I've been studying some of the standard definitions that are easily found online and in standard textbooks. However, I was specifically wondering why the approach I will outline below is not correct.

I understand estimating a quantile regression as estimating a changing relationship between $y$ and $X$ that varies over the distribution of $y$. In more formal terms, I have been thinking about a hypothetical quantile regression data generating process as

$y_{\tau} = X_{\tau}\beta_{\tau} + \epsilon_{\tau}$

where $y_{\tau}$ is an estimate of $y$ in quantile $\tau$ and $\epsilon$ is an error term. The main point is that the coefficient $\beta_{\tau}$ varies over the quantiles of $y$. In more simple terms, each quantile $\tau$ has its own coefficient $\beta_{\tau}.$

I implemented the DGP above and simulated 100 data points (representing 100 percentiles) increasing $\beta_\tau$ gradually. Then, I estimated a quantile regression for all percentiles of $y$. This exercise basically returns the OLS coefficient for all percentiles. I was expecting a lot of noise due to only 1 observation per percentile. However, I did also expect some evidence pointing into the direction of the true values of $\beta_\tau$. Now I am wondering where I went wrong and how the setup above differs from traditional quantile regression.

Edit:

To add some clarification after David's comment: Seemingly, the interpretation of quantile regressions is "a unit change in $X$ changes quantile $\tau$ of $y$ by $\beta_{\tau}$" where we estimate different $\beta_{\tau}$ for each quantile. If that is not the correct interpretation, this would already solve my question. If that is the correct interpretation, I do not understand why the DGP I outlined above, interpreted as "each quantile $\tau$ has its own coefficient $\beta_{\tau}$" does not give similar results.

My Code (in R):

X    = rnorm(100)
beta = seq(0,5,length.out = 100)
y = c()
for(tt in 1:100){

 y[tt] = X[tt] * beta[tt] + rnorm(1) 

}

plot(y,X)

olsres = summary(lm(y~X-1))

qrres = quantreg::rq(y~X-1,tau = seq(0,1,length.out=100))

Best Answer

I suspect you mistake Quantile Regression for some sort of piece-wise linear regression, where a normal OLS line is fitted to subsets of the observation space (note that if you think about this, it can be quite complicated to determine how to subset this data in a multivariate case if you only have a single parameter $\tau$).

Quantile regression is something different, where the conditional median is estimated (for $\tau = 0.5$) or at any other percentile of interest. Which percentile depends on the value of $\tau$ you specify: you specifically are calculating the conditional median at every percentile. It is usually applied in cases where certain assumptions do not hold (for example if there is a multimodal structure to your data and a conditional mean is not very informative). It is not re-estimating the OLS line based on bins of your independent variable (as you seem to suspect). Note also that if it was doing this, you would have a single observation in every bin and it would not even be possible to estimate a slope $\hat{\beta}$. In other words: Quantile Regression does not subset the number of observations used or anything like that.

Related Question