Quantile Regression – Understanding the Loss Function in Quantile Regression

loss-functionsquantile regressionquantiles

I am trying to understand the quantile regression, but one thing that makes me suffer is the choice of the loss function.

$\rho_\tau(u) = u(\tau-1_{\{u<0\}})$

I know that the minimum of the expectation of $\rho_\tau(y-u)$ is equal to the $\tau\%$-quantile, but what is the intuitive reason to start off with this function? I don't see the relation between minimizing this function and the quantile.
Can somebody explain it to me?

Best Answer

I understand this question as asking for insight into how one could come up with any loss function that produces a given quantile as a loss minimizer no matter what the underlying distribution might be. It would be unsatisfactory, then, just to repeat the analysis in Wikipedia or elsewhere that shows this particular loss function works.

Let's begin with something familiar and simple.

What you're talking about is finding a "location" $x^{*}$ relative to a distribution or set of data $F$. It is well known, for instance, that the mean $\bar x$ minimizes the expected squared residual; that is, it is a value for which

$$\mathcal{L}_F(\bar x)=\int_{\mathbb{R}} (x - \bar x)^2 dF(x)$$

is as small as possible. I have used this notation to remind us that $\mathcal{L}$ is derived from a loss, that it is determined by $F$, but most importantly it depends on the number $\bar x$.

The standard way to show that $x^{*}$ minimizes any function begins by demonstrating the function's value does not decrease when $x^{*}$ is changed by a little bit. Such a value is called a critical point of the function.

What kind of loss function $\Lambda$ would result in a percentile $F^{-1}(\alpha)$ being a critical point? The loss for that value would be

$$\mathcal{L}_F(F^{-1}(\alpha)) = \int_{\mathbb{R}} \Lambda(x-F^{-1}(\alpha))dF(x)=\int_0^1\Lambda\left(F^{-1}(u)-F^{-1}(\alpha)\right)du.$$

For this to be a critical point, its derivative must be zero. Since we're just trying to find some solution, we won't pause to see whether the manipulations are legitimate: we'll plan to check technical details (such as whether we really can differentiate $\Lambda$, etc.) at the end. Thus

$$\eqalign{0 &=\mathcal{L}_F^\prime(x^{*})= \mathcal{L}_F^\prime(F^{-1}(\alpha))= -\int_0^1 \Lambda^\prime\left(F^{-1}(u)-F^{-1}(\alpha)\right)du \\ &= -\int_0^{\alpha} \Lambda^\prime\left(F^{-1}(u)-F^{-1}(\alpha)\right)du -\int_{\alpha}^1 \Lambda^\prime\left(F^{-1}(u)-F^{-1}(\alpha)\right)du.\tag{1} }$$

On the left hand side, the argument of $\Lambda$ is negative, whereas on the right hand side it is positive. Other than that, we have little control over the values of these integrals because $F$ could be any distribution function. Consequently our only hope is to make $\Lambda^\prime$ depend only on the sign of its argument, and otherwise it must be constant.

This implies $\Lambda$ will be piecewise linear, potentially with different slopes to the left and right of zero. Clearly it should be decreasing as zero is approached--it is, after all, a loss and not a gain. Moreover, rescaling $\Lambda$ by a constant will not change its properties, so we may feel free to set the left hand slope to $-1$. Let $\tau \gt 0$ be the right hand slope. Then $(1)$ simplifies to

$$0 = \alpha - \tau (1 - \alpha),$$

whence the unique solution is, up to a positive multiple,

$$\Lambda(x) = \cases{-x, \ x \le 0 \\ \frac{\alpha}{1-\alpha}x, \ x \ge 0.}$$

Multiplying this (natural) solution by $1-\alpha$, to clear the denominator, produces the loss function presented in the question.

Clearly all our manipulations are mathematically legitimate when $\Lambda$ has this form.