Quantile Regression – How to Formulate Quantile Regression as a Linear Programming Problem

linear programmingquantile regressionregression

How do I formulate quantile regression as a Linear Programming problem?
When looking at the median quantile problem I know it is

\begin{align}
\text{minimize } & \sum_{i=1}^n |\beta_0 + X_i \beta_1-Y_i|\\
\text{transforms into } & \\
\text{minimize } & \sum_{i=1}^n e_i\\
\text{s.t.} & \\
& e_i\geq \beta_0 + X_i\beta_{1}-Y_i\\
& e_i\geq -(\beta_0 + X_i\beta_{1}-Y_i)
\end{align}
but how do I transform the minimization for other quantiles?

Best Answer

You use the quantile regression estimator

$$\hat \beta(\tau) := \arg \min_{\theta \in \mathbb R^K} \sum_{i=1}^N \rho_\tau(y_i - \mathbf x_i^\top \theta).$$

where $\tau \in (0,1)$ is constant chosen according to which quantile needs to be estimated and the function $\rho_\tau(.)$ is defined as

$$\rho_\tau(r) = r(\tau - I(r<0)).$$

In order to see the purpose of the $\rho_\tau(.)$ consider first that it takes the residuals as arguments, when these are defined as $\epsilon_i =y_i - \mathbf x_i^\top \theta$. The sum in the minimization problem can therefore be rewritten as

$$\sum_{i=1}^N \rho_\tau(\epsilon_i) =\sum_{i=1}^N \tau \lvert \epsilon_i \lvert I[\epsilon_i \geq 0] + (1-\tau) \lvert \epsilon_i \lvert I[\epsilon_i < 0]$$

such that positive residuals associated with observation $y_i$ above the suggested quantile regression line $\mathbf x_i^\top \theta$ are given the weight of $\tau$ while negative residuals associated with observations $y_i$ below the suggested quantile regression line $\mathbf x_i^\top \theta$ are weighted with $(1-\tau)$.

Intuitively:

With $\tau=0.5$ positive and negative residuals are "punished" with the same weight and an equal number of observation are above and below the "line" in optimum so the line $\mathbf x_i^\top \hat \beta$ is the median regression "line".

When $\tau=0.9$ each positive residual is weighted 9 times that of a negative residual with weight $1-\tau= 0.1$ and so in optimum for every observation above the "line" $\mathbf x_i^\top \hat \beta$ approximately 9 will be placed below the line. Hence the "line" represent the 0.9-quantile. (For an exact statement of this see THM. 2.2 and Corollary 2.1 in Koenker (2005) "Quantile Regression")

The two cases are illustrated in these plots. Left panel $\tau=0.5$ and right panel $\tau=0.9$.

Linear programs are predominantly analyzed and solved using the standard form

$$(1) \ \ \min_z \ \ c^\top z \ \ \mbox{subject to } A z = b , z \geq 0$$

To arrive at a linear program on standard form the first problem is that in such a program (1) all variables over which minimization is performed $z$ should be positive. To achieve this residuals are decomposed into positive and negative part using slack variables:

$$\epsilon_i = u_i - v_i$$

where positive part $u_i = \max(0,\epsilon_i) = \lvert \epsilon_i \lvert I[\epsilon_i \geq 0]$ and $v_i = \max(0,-\epsilon_i) =\lvert \epsilon_i \lvert I[\epsilon_i < 0]$ is the negative part. The sum of residuals assigned weights by the check function is then seen to be

$$\sum_{i=1}^N \rho_\tau(\epsilon_i) = \sum_{i=1}^N \tau u_i + (1-\tau) v_i = \tau \mathbf 1_N^\top u + (1-\tau)\mathbf 1_N^\top v,$$

where $u = (u_1,...,u_N)^\top$ and $v=(v_1,...,v_N)^\top$ and $\mathbf 1_N$ is vector $N \times 1$ all coordinates equal to $1$.

The residuals must satisfy the $N$ constraints that

$$y_i - \mathbf x_i^\top\theta = \epsilon_i = u_i - v_i$$

This results in the formulation as a linear program

$$\min_{\theta \in \mathbb R^K,u\in \mathbb R_+^N,v\in \mathbb R_+^N}\{ \tau \mathbf 1_N^\top u + (1-\tau)\mathbf 1_N^\top v\lvert y_i= \mathbf x_i\theta + u_i - v_i, i=1,...,N\},$$

as stated in Koenker (2005) "Quantile Regression" page 10 equation (1.20).

However it is noticeable that $\theta\in \mathbb R$ is still not restricted to be positive as required in the linear program on standard form (1). Hence again decomposition into positive and negative part is used

$$\theta = \theta^+ - \theta^- $$

where again $\theta^+=max(0,\theta)$ is positive part and $\theta^- = \max(0,-\theta)$ is negative part. The $N$ constraints can then be written as

$$\mathbf y:= \begin{bmatrix} y_1 \\ \vdots \\ y_N\end{bmatrix} = \begin{bmatrix} \mathbf x_1^\top \\ \vdots \\ \mathbf x_N^\top \end{bmatrix}(\theta^+ - \theta^-) + \mathbf I_Nu - \mathbf I_Nv ,$$

where $\mathbf I_N = diag\{\mathbf 1_N\}$.

Next define $b:=\mathbf y$ and the design matrix $\mathbf X$ storing data on independent variables as

$$ \mathbf X := \begin{bmatrix} \mathbf x_1^\top \\ \vdots \\ \mathbf x_N^\top \end{bmatrix} $$

To rewrite constraint:

$$b= \mathbf X(\theta^+ - \theta^-) + \mathbf I_N u- \mathbf I_N v= [\mathbf X , -\mathbf X , \mathbf I_N ,\mathbf - \mathbf I_N] \begin{bmatrix} \theta^+ \\ \theta^- \\ u \\ v\end{bmatrix}$$

Define the $(N \times 2K + 2N )$ matrix

$$A := [\mathbf X , -\mathbf X , \mathbf I_N ,\mathbf - \mathbf I_N]$$ and introduce $\theta^+$ and $\theta^-$ as variables over which to minimize so they are part of $z$ to get

$$b = A \begin{bmatrix} \theta^+ \\ \theta^- \\ u \\ v\end{bmatrix} = Az$$

Because $\theta^+$ and $\theta^-$ only affect the minimization problem through the constraint a $\mathbf 0$ of dimension $2K\times 1$ must be introduced as part of the coeffient vector $c$ which can the appropriately be defined as

$$ c = \begin{bmatrix}\mathbf 0 \\ \tau \mathbf 1_N \\ (1-\tau) \mathbf 1_N \end{bmatrix},$$

thus ensuring that $c^\top z = \underbrace{\mathbf 0^\top(\theta^+ - \theta^-)}_{=0}+\tau \mathbf 1_N^\top u + (1-\tau)\mathbf 1_N^\top v = \sum_{i=1}^N \rho_\tau(\epsilon_i).$

Hence $c,A$ and $b$ are then defined and the program as given in $(1)$ completely specified.

This is probably best digested using an example. To solve this in R use the package quantreg by Roger Koenker. Here is also illustration of how to set up the linear program and solve with a solver for linear programs:

base=read.table("http://freakonometrics.free.fr/rent98_00.txt",header=TRUE)
attach(base)
library(quantreg)
library(lpSolve)
tau <- 0.3


# Problem (1) only one covariate
X <- cbind(1,base$area)
K <- ncol(X)
N <- nrow(X)

A <- cbind(X,-X,diag(N),-diag(N))
c <- c(rep(0,2*ncol(X)),tau*rep(1,N),(1-tau)*rep(1,N))
b <- base$rent_euro
const_type <- rep("=",N)

linprog <- lp("min",c,A,const_type,b)
beta <- linprog$sol[1:K] -  linprog$sol[(1:K+K)]
beta
rq(rent_euro~area, tau=tau, data=base)


# Problem (2) with 2 covariates
X <- cbind(1,base$area,base$yearc)
K <- ncol(X)
N <- nrow(X)

A <- cbind(X,-X,diag(N),-diag(N))
c <- c(rep(0,2*ncol(X)),tau*rep(1,N),(1-tau)*rep(1,N))
b <- base$rent_euro
const_type <- rep("=",N)

linprog <- lp("min",c,A,const_type,b)
beta <- linprog$sol[1:K] -  linprog$sol[(1:K+K)]
beta
rq(rent_euro~ area + yearc, tau=tau, data=base)

Related Solutions

Quantile Regression – Understanding Quantile Regression Estimator Formula

If you remember, OLS minimizes the sum of the squared residuals $\sum_i u_{i}^{2}$ whereas median regression minimizes the sum of absolute residuals $\sum_i \mid u_i \mid$. The median or least absolute deviations (LAD) estimator is a special case of quantile regression in which you have $q = .5$. In quantile regression we minimize a sum of absolute errors that receives asymmetric weights for overprediction $(1-q)$ and $q$ for underprediction. You can start from the LAD representation and extend this as the sum of the fraction of the data which are weighted by $q$ and $(1-q)$ given their value of $u_i$, and work on it as follows:

$$ \begin{align} \rho_q(u) &= 1(u_i>0) \, q\mid u_i\mid + 1(u_i\leq 0) \, (1-q)\mid u_i \mid \newline &= 1(y_i - x'_i \beta_q > 0) \, q\mid y_i - x'_i \beta_q \mid + 1(y_i - x'_i\beta_q \leq 0) \, (1-q)\mid y_i - x'_i \beta_q \mid \end{align} $$ This just uses the fact that $u_i = y_i - x'_i \beta_q$ and then you can re-write the indicator function as sums of the observations that satisfy the conditions of the indicators. This will give the first expression you wrote down for the quantile regression estimator.

$$ \begin{align} &= \sum^{n}_{i:y_i>x'_i\beta_q}q\mid y_i - x'_i\beta_q \mid + \sum^{n}_{i:y_i\leq x'_i\beta_q} (1-q) \mid y_i - x'_i\beta_q \mid \newline &= q \sum^{n}_{i:y_i>x'_i\beta_q} \mid y_i - x'_i\beta_q \mid + (1-q)\sum^{n}_{i:y_i\leq x'_i\beta_q} \mid y_i - x'_i\beta_q \mid \newline &= q \sum^{n}_{i:y_i>x'_i\beta_q} (y_i - x'_i\beta_q) - (1-q)\sum^{n}_{i:y_i\leq x'_i\beta_q} ( y_i - x'_i\beta_q ) \newline &= q \sum^{n}_{i:y_i>x'_i\beta_q} (y_i - x'_i\beta_q) - \sum^{n}_{i:y_i\leq x'_i\beta_q} (y_i - x'_i\beta_q) + q \sum^{n}_{i:y_i\leq x'_i\beta_q} (y_i - x'_i\beta_q) \newline &= q \sum^{n}_{i=1} (y_i - x'_i \beta_q) - \sum^{n}_{i=1}1(y_i - x'_i\beta_q\leq 0)(y_i - x'_i\beta_q) \newline &= \sum^{n}_{i=1}(q - 1(u_i \leq 0))u_i \end{align} $$

The second line takes out the weights from the summations. The third line gets rid of the absolute values and replaces them by the actual values. By definition $y_i - x'_i\beta_q$ is negative whenever $y_i < x'_i\beta_q$, hence the sign change in this line. The fourth line multiplies out $(1-q)$. You then realize that $$q\sum^{n}_{i:y_i>x'_i\beta_q}(y_i - x'_i\beta_q) + q\sum^{n}_{i:y_i \leq x'_i\beta_q}(y_i - x'_i\beta_q) = \sum^{n}_{i=1}(y_i - x'_i\beta_q)$$ and replacing the summation of the middle term in the fourth line by the corresponding indicator you arrive at the fifth line. Factorizing and then replacing $y_i - x'_i\beta_q$ with $u_i$ yields the second expression of your estimator.
This shows how the two expressions are equivalent.

Solved – Quantile Regression vs OLS for homoscedasticity

Will the estimated slope coefficient $\beta_1$ always be the same for OLS and for QR for different quantiles?

No, of course not, because the empirical loss function being minimized differs in these different cases (OLS vs. QR for different quantiles).

I am well aware that in the presence of homoscedasticity all the slope parameters for different quantile regressions will be the same and that the QR models will differ only in the intercept.

No, not in finite samples. Here is an example taken from the help files of the quantreg package in R:

    library(quantreg)
    data(stackloss)
    rq(stack.loss ~ stack.x,tau=0.50) #median (l1) regression fit 
                                      # for the stackloss data.
    rq(stack.loss ~ stack.x,tau=0.25) #the 1st quartile

However, asymptotically they will all converge to the same true value.

But in the case of homoscedasticity, shouldn't outliers cancel each other out because positive errors are as likely as negative ones, rendering OLS and median QR slope coefficient equivalent?

No. First, perfect symmetry of errors is not guaranteed in any finite sample. Second, minimizing the sum of squares vs. absolute values will in general lead to different values even for symmetric errors.

Best Answer

Related Solutions

Quantile Regression – Understanding Quantile Regression Estimator Formula

Solved – Quantile Regression vs OLS for homoscedasticity

Related Question