Solved – Why use quantile regression instead of splitting the data in quantiles and calculating multiple linear regressions

least squaresquantile regressionregression

Why use quantile regression instead of splitting the data in quantiles and calculating multiple linear regressions?

What are the advantages and disadvantages of these methods?

As far as I understand quantile regression is based on the median and therefore more outlier resistant, however I could also split the data in quantiles and could calculate median regressions for each quantile?

Best Answer

You need to look at the difference between conditional and unconditional quantiles.

Your approach analyzes unconditional quantiles of $y$, and how they depend on $x$. That may be a worthwhile question to ask, but it is not the question that quantile regression discusses.

Quantile regression analyzes quantiles of $y$ conditional on $x$. That is: given a value of $x$, what is the likely quantile of the conditional distribution of $y$ for exactly this $x$?

Let's simulate a little data.

qr 1

Quantile regression will fit a line (in the simplest case, a linear relationship with $x$, i.e., a straight line) such that at each value of $x$, we expect a certain percentage of the data to lie above this line. Here, I am working with an 80% quantile:

qr 2

The approach you propose amounts to cutting off the top 20% of the $y$ without regard to $x$. Graphically, that amounts to putting a horizontal line through the point cloud and then looking at the points above this line:

qr 3

An analysis of these points may be useful. But it will simply be a different analysis than quantile regression. You may be able to say something about the distribution of $x$ among your top 20% of $y$. But you will not be able to say anything about the conditional quantile of $y$ for any given $x$.

R code for the plots:

n_points <- 2000
set.seed(1)
xx <- rnorm(n_points)
yy <- xx+rnorm(n_points)

qq <- 0.8

width <- 400
height <- 400

png("qr_1.png",width=width,height=height)
    par(mai=c(.8,.8,.1,.1),las=1)
    plot(xx,yy,pch=19,cex=0.6)
dev.off()

library(quantreg)
model <- rq(yy~xx,tau=qq)
png("qr_2.png",width=width,height=height)
    par(mai=c(.8,.8,.1,.1),las=1)
    plot(xx,yy,pch=19,cex=0.6,col="lightgray")
    abline(model,lwd=1.5,col="red")
    index <- yy>=predict(model)
    points(xx[index],yy[index],pch=19,cex=0.6)
dev.off()

png("qr_3.png",width=width,height=height)
    par(mai=c(.8,.8,.1,.1),las=1)
    plot(xx,yy,pch=19,cex=0.6,col="lightgray")
    index <- yy>=quantile(yy,qq)
    points(xx[index],yy[index],pch=19,cex=0.6)
dev.off()
Related Question