Solved – BFGS & LBFGS for linear regression (overkill or compatibility issue)

linearlogisticoptimizationregression

BFGS and LBFGS algorithms are often seen used as optimization methods for non-linear machine learning problems such as with neural networks back propagation and logistic regression.

My question is why arent they implemented in everything that gradient descent is even remotely related to, LINEAR regression for example?

is it simply a matter of overkill and an unnecessary piece of machinery in this case (but it would still theoretically improve training time) or is there an actual compatibility issue where BFGS and LBFGS needs non-linearity for them to work?

Best Answer

With linear regression, BFGS and LBFGS would be a major step backwards. That's because the solution can be directly written as

$\hat \beta = (X^T X)^{-1} X^T Y$

It's worth noting that directly using the above equation to calculate $\hat \beta$ (i.e. inverting $X^T X$ and then multiplying by $X^T Y$) is itself even a poor way to calculate $\hat \beta$.

Also, gradient descent is only recommended for linear regression in extremely special cases, so I wouldn't say gradient descent is "related" to linear regression.

Related Question