[Math] How to derive the Levenberg–Marquardt algorithm with matrix calculus

calculuslinear algebranumerical methodsoptimization

According to the wikipedia article: http://en.wikipedia.org/wiki/Levenberg_Marquardt

$S(\boldsymbol\beta+\boldsymbol\delta) \approx \|\mathbf{y} – \mathbf{f}(\boldsymbol\beta) – \mathbf{J}\boldsymbol\delta\|^2$

Taking the derivative with respect to δ and setting the result to zero gives:

$(J^{T}J)\boldsymbol \delta = J^{T} [y – f(\boldsymbol \beta)])$

My attempt to derive the equation:

$\|\mathbf{y} – \mathbf{f}(\boldsymbol\beta) – \mathbf{J}\boldsymbol\delta\|^2
= (\mathbf{y} – \mathbf{f}(\boldsymbol\beta) – \mathbf{J}\boldsymbol\delta)^T(\mathbf{y} – \mathbf{f}(\boldsymbol\beta) – \mathbf{J}\boldsymbol\delta)$

using product rule:

$\frac{\partial \|\mathbf{y} – \mathbf{f}(\boldsymbol\beta) – \mathbf{J}\boldsymbol\delta\|^2}{\partial \boldsymbol\delta} = (-J^{T})(\mathbf{y} – \mathbf{f}(\boldsymbol\beta) – \mathbf{J}\boldsymbol\delta) + (\mathbf{y} – \mathbf{f}(\boldsymbol\beta) – \mathbf{J}\boldsymbol\delta)^T(-J)$

The dimensions of the left and right side don't match. I believe there might be something wrong with my differentiation. There seems to be a transpose missing, but I'm not sure what would cause a transpose in the differentiation operation.

Best Answer

I made this tutorial article about linear and nonlinear least-squares methods. It explains the problem in matrix and vector terms and I tried to make it easy to learn.