Solved – Quantile regression – “check function”

least squaresquantile regression

The "check function" in quantile regression is defined as

$\rho_\tau(u) = u(\tau-1_{\{u<0\}})$

I do understand the basic princible of quantile regression. Now I tried to dig a bit deeper to understand the basic algebra behind it. Now my, probably very trivial question regarding the above mention function:

I read the original paper by Koenker and the $u$ is not defined formally. The same goes for a few other papers I have read. The only comment I found was: "We should think of $u$ as an individual error $u=y-r$ and $ρ_τ(u)$ as the loss associated with $u$." (Arellano). What does this mean? What is $u$? Is it simply the residual?

I am aware that my question might be rather trivial, but I tried my best in finding a proper explanation and I just dont get it.

Best Answer

The check function stems from applying an optimization view of expressing the $\tau$-th sample quantile of a sample $\{Y_1, \ldots, Y_n\}$.

Conventionally, given an observed sample $Y_1, \ldots, Y_n$, the $\tau$-th sample quantile $\hat{Q}_Y(\tau)$ is defined by ranking, i.e., $\hat{Q}_Y(\tau)$ is the $\lfloor n\tau \rfloor$-th order statistic of $(Y_1, \ldots, Y_n)$. In a completely different point of view, it can be shown that it is also the solution of the following optimization problem: $$\hat{Q}_Y(\tau) = \text{argmin}_{\xi} \sum_{i = 1}^n \rho_\tau(Y_i - \xi). \tag{1}$$

An intuitive proof of this fact can be found in Qunantile Regression (2005), Section 1.3, by Roger Koenker. In view of $(1)$, the $\tau$-th sample quantile receives a new interpretation as the minimizer of some loss function which is determined by the check function $\rho_\tau(\cdot)$. This is in agreement with more standardized results for the least-squares estimate and the least-absolute-deviation estimate, as the following chart shows:

enter image description here

The extension from the one-sample problem above to the regression setting is straightforward, which simply replaces the $\xi$ in $(1)$ by the regression function $x'b$ (yes, the aim here is to minimize the total "loss" of residuals, where "loss" is clearly defined by the $\rho_\tau(\cdot)$:

$$\hat{\beta}(\tau) = \text{argmin}_{b \in \mathbb{R}^p} \sum_{i = 1}^n \rho_\tau(Y_i - x_i'b).$$

$\hat{\beta}(\tau)$ is referred to as $\tau$-th regression quantile, which, by virtue of the property of $\rho_\tau(\cdot)$, also bears some interesting ordering interpretation relative to the fitted regression quantile surface $y = x'\hat{\beta}(\tau)$, for details, see remark on page 40 of Regression Quantiles (1978) by Koenker and Bassett.

In summary, the check function is a loss function that retrieves the $\tau$-th sample quantile, and more importantly, that makes the generalization from the one-sample problem (where ordering is possible) to the regression problem (where ordering is rather awkward) practical.