Quantile Regression – Understanding Quantile Regression Estimator Formula

equivalenceproofquantile regression

I have seen two different representations of the quantile regression estimator which are

$$Q(\beta_{q}) = \sum^{n}_{i:y_{i}\geq x'_{i}\beta} q\mid y_i – x'_i \beta_q \mid + \sum^{n}_{i:y_{i}< x'_{i}\beta} (1-q)\mid y_i – x'_i \beta_q \mid$$

and
$$Q(\beta_q) = \sum^{n}_{i=1} \rho_q (y_i – x'_i \beta_q), \hspace{1cm} \rho_q(u) = u_i(q – 1(u_i < 0 ))$$

where $u_i = y_i – x'_i \beta_q$. Can somebody tell me how to show equivalence of these two expressions? Here is what I tried so far, starting from the second expression.

$$
\begin{align}
Q(\beta_q) &= \sum^{n}_{i=1} u_i(q – 1(u_i < 0 )) (y_i – x'_i \beta_q) \newline
&= \sum^{n}_{i=1}(y_i – x'_i \beta_q)(q – 1(y_i – x'_i \beta_q < 0 )) (y_i – x'_i \beta_q) \newline
&=\left[ \sum^{n}_{i:y_{i}\geq x'_{i}\beta}(q(y_i – x'_i\beta_q)) + \sum^{n}_{i:y_{i}< x'_{i}\beta}(q(y_i – x'_i\beta_q)-(y_i – x'_i\beta_q)) \right](y_i – x'_i\beta_q)
\end{align}
$$
But from this point I got stuck on how to proceed. Please not that this is not a homework or assignment question. Many thanks.

Best Answer

If you remember, OLS minimizes the sum of the squared residuals $\sum_i u_{i}^{2}$ whereas median regression minimizes the sum of absolute residuals $\sum_i \mid u_i \mid$. The median or least absolute deviations (LAD) estimator is a special case of quantile regression in which you have $q = .5$. In quantile regression we minimize a sum of absolute errors that receives asymmetric weights for overprediction $(1-q)$ and $q$ for underprediction. You can start from the LAD representation and extend this as the sum of the fraction of the data which are weighted by $q$ and $(1-q)$ given their value of $u_i$, and work on it as follows:

$$ \begin{align} \rho_q(u) &= 1(u_i>0) \, q\mid u_i\mid + 1(u_i\leq 0) \, (1-q)\mid u_i \mid \newline &= 1(y_i - x'_i \beta_q > 0) \, q\mid y_i - x'_i \beta_q \mid + 1(y_i - x'_i\beta_q \leq 0) \, (1-q)\mid y_i - x'_i \beta_q \mid \end{align} $$ This just uses the fact that $u_i = y_i - x'_i \beta_q$ and then you can re-write the indicator function as sums of the observations that satisfy the conditions of the indicators. This will give the first expression you wrote down for the quantile regression estimator.

$$ \begin{align} &= \sum^{n}_{i:y_i>x'_i\beta_q}q\mid y_i - x'_i\beta_q \mid + \sum^{n}_{i:y_i\leq x'_i\beta_q} (1-q) \mid y_i - x'_i\beta_q \mid \newline &= q \sum^{n}_{i:y_i>x'_i\beta_q} \mid y_i - x'_i\beta_q \mid + (1-q)\sum^{n}_{i:y_i\leq x'_i\beta_q} \mid y_i - x'_i\beta_q \mid \newline &= q \sum^{n}_{i:y_i>x'_i\beta_q} (y_i - x'_i\beta_q) - (1-q)\sum^{n}_{i:y_i\leq x'_i\beta_q} ( y_i - x'_i\beta_q ) \newline &= q \sum^{n}_{i:y_i>x'_i\beta_q} (y_i - x'_i\beta_q) - \sum^{n}_{i:y_i\leq x'_i\beta_q} (y_i - x'_i\beta_q) + q \sum^{n}_{i:y_i\leq x'_i\beta_q} (y_i - x'_i\beta_q) \newline &= q \sum^{n}_{i=1} (y_i - x'_i \beta_q) - \sum^{n}_{i=1}1(y_i - x'_i\beta_q\leq 0)(y_i - x'_i\beta_q) \newline &= \sum^{n}_{i=1}(q - 1(u_i \leq 0))u_i \end{align} $$

The second line takes out the weights from the summations. The third line gets rid of the absolute values and replaces them by the actual values. By definition $y_i - x'_i\beta_q$ is negative whenever $y_i < x'_i\beta_q$, hence the sign change in this line. The fourth line multiplies out $(1-q)$. You then realize that $$q\sum^{n}_{i:y_i>x'_i\beta_q}(y_i - x'_i\beta_q) + q\sum^{n}_{i:y_i \leq x'_i\beta_q}(y_i - x'_i\beta_q) = \sum^{n}_{i=1}(y_i - x'_i\beta_q)$$ and replacing the summation of the middle term in the fourth line by the corresponding indicator you arrive at the fifth line. Factorizing and then replacing $y_i - x'_i\beta_q$ with $u_i$ yields the second expression of your estimator.
This shows how the two expressions are equivalent.

Related Question