Solved – Gradient descent oscillating a lot. Have I chosen the step direction incorrectly

loss-functionsoptimization

I'm trying to run a basic gradient descent algorithm with a absolute loss function. I can get it to converge to a good solution by it requires a much lower step size and more iterations than had I used square loss. Is this normal? Should I expect absolute loss to take a longer time to come to a good solution or potentially oscillate around a solution more than say squared loss?

Best Answer

When you say 'a absolute loss function', do you mean you're using least absolute deviations (LAD) instead of the more usual ordinary least squares (OLS)? As that wikipedia article says, although LAD is more robust to outliers than OLS it can be unstable and even have multiple solutions, so it doesn't seem that surprising if it's harder to find the minimum of the objective function even when there's only one.

If you're trying this because you're after some sort of robust regression, I think there are several more attractive alternatives than LAD.

Related Question