Iteratively Reweighted Least Squares

least squareslinear algebraregression

I'm trying to implement iteratively reweighted least squares.

Looking at the wikipedia article, I don't understand the following line

$\boldsymbol\beta^{(t+1)}
=
\underset{\boldsymbol\beta}{ \operatorname{arg\,min} }
\sum_{i=1}^n w_i^{(t)} \left| y_i – X_i \boldsymbol\beta \right|^2
=
(X^{\rm T} W^{(t)} X)^{-1} X^{\rm T} W^{(t)} \mathbf{y}$

shouldn't weighted linear least squares rather look like this:

$\boldsymbol\beta^{(t+1)}
=
\underset{\boldsymbol\beta}{ \operatorname{arg\,min} }
\sum_{i=1}^n w_i^{(t)} \left| y_i – X_i \boldsymbol\beta \right|^2
=
((W^{(t)} X)^{\rm T} (W^{(t)} X))^{-1} W^{(t)} \mathbf{y}$

with $W^{(t)}$ containing the square-roots of weights $w_i^{(t)}$?

or are these equivalent?

Best Answer

If we define the following vector/matrix variables $$\eqalign{ r &= X\beta - y \qquad&({\rm residual/error\,vector}) \\ R &= {\rm Diag}(r) \qquad&({\rm as\,a\,diagonal\,matrix}) \\ W &= {\rm Diag}(w) \qquad&({\rm weights\,as\,a\,matrix}) \\ }$$ then the objective function can be written without explicit summations as $$\eqalign{ \phi &= W:R^2 \\ }$$ where the colon is a convenient product notation for the trace, i.e. $$A:B = {\rm Tr}(A^TB) = {\rm Tr}(B^TA) = B:A$$ Calculate the gradient of the function. $$\eqalign{ d\phi &= W:(2R\,dR) \\ &= 2WR:dR \\ &= 2\,{\rm diag}(WR):{\rm diag}(dR) \\ &= 2Wr:dr \\ &= 2W(X\beta - y):(X\,d\beta) \\ &= 2X^TW(X\beta - y):d\beta \\ \frac{\partial\phi}{\partial\beta} &= 2X^TW(X\beta - y) \\ }$$ Set the gradient to zero and solve for the optimal $\beta$. $$\eqalign{ X^TWX\beta &= X^TWy \\ \beta &= (X^TWX)^{-1}X^TWy \\ }$$ This confirms the Wikipedia result.

Related Question