[Math] Minimizing $L_\infty$ norm using gradient descent

optimization

Curve fitting problems are solved by minimizing a cost/error function with respect to the model's parameters. Gradient descent and Newton's method are among many algorithms commonly used to minimize this function.

The $L_\infty$ norm can also be used as a cost function for linear/polynomial regression. My question: is it possible to use gradient descent to minimize cost defined by the $L_\infty$ norm (i.e. $\text{cost} = \max|\text{predicted} – \text{actual}|$)? How is the gradient of this function even defined?

Best Answer

The traditional approach would be to reformulate the problem to make it smooth. In this case, it's easy: $$ \min_x \ \|Ax - b\|_{\infty} $$ is equivalent to $$ \min_{x,t} \ t \quad \text{subject to} \ -te \leq Ax - b \leq te, $$ where $e = (1, 1, \dots, 1)$. This is a linear program that you can solve with a standard LP solver. You can play the same trick with the $\ell_1$ norm, which is often used in curve fitting to mitigate the influence of outliers: $$ \min_x \ \|Ax - b\|_1 $$ is equivalent to $$ \min_{x,s,t} \ \sum_i s_i+t_i \quad \text{subject to} \ Ax-b = s - t, \ (s, t) \geq 0. $$ Again, this is a linear program.

If you want to avoid constraints, you could apply a subgradient method to the nonsmooth objective $\|Ax-b\|_{\infty}$. It's almost identical to applying the gradient method except that instead of a gradient, you compute a subgradient. See http://www.seas.ucla.edu/~vandenbe/236C/lectures/subgradients.pdf.

Best Answer

Related Solutions

[Math] Why is Newton’s method faster than gradient descent

[Math] Differentiating the L1-norm

Related Question