Optimization – Linear Regression for Minimizing Maximum Residuals

optimizationregression

We know that simple linear regression will do the following thing:

Suppose there are $n$ data points $\{y_i,x_i\}$, where $i=1,2,\dots,n$. The goal is to find the equation of the straight line

$y=\alpha+\beta x$

which provides a best fit for the data points. Here "best" will be be understood as in the least-squares approach: such a line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers $\alpha$ and $\beta$ solve the following minimization problem:

Find $\underset{{\alpha,\beta}}{\arg\min}\;Q(\alpha,\beta)$, where $Q(\alpha,\beta)=\sum\limits_{i=1}^n(y_i-\alpha-\beta x_i)^2$

My question is: if I want to minimize the following function, how to get $\alpha, \beta$:

$\underset{{\alpha,\beta}}{\arg\min}\;P(\alpha,\beta)$, where $P(\alpha,\beta)=\max\limits_{1\leq i\leq n} |y_i-\alpha-\beta x_i|$

Best Answer

You're asking about doing linear regression with the $L_{\infty}$ norm or, equivalently, the Chebyshev approximation criterion, rather than the usual $L_2$ norm that minimizes the sum of squared residuals.

There isn't a nice formula that will give you $\alpha$ and $\beta$. Instead, the standard approach is to obtain $\alpha$ and $\beta$ as the solution to a linear programming problem. The formulation is

$$\text{Minimize } r$$

subject to $$r - (y_i - \alpha - \beta x_i ) \geq 0, \text{ for each } i,$$ $$r + (y_i - \alpha - \beta x_i ) \geq 0, \text{ for each } i.$$ The variables are $r$ (the maximum residual), $\alpha$, and $\beta$, and the $(x_i, y_i)$ are the data values that become parameters in the LP formulation.

Here's an example, although the author assumes a model with $\alpha = 0$.

Related Question