Iteratively Reweighted Least Squares

least squareslinear algebraregression

I'm trying to implement iteratively reweighted least squares.

Looking at the wikipedia article, I don't understand the following line

$\boldsymbol\beta^{(t+1)}
=
\underset{\boldsymbol\beta}{ \operatorname{arg\,min} }
\sum_{i=1}^n w_i^{(t)} \left| y_i – X_i \boldsymbol\beta \right|^2
=
(X^{\rm T} W^{(t)} X)^{-1} X^{\rm T} W^{(t)} \mathbf{y}$

shouldn't weighted linear least squares rather look like this:

$\boldsymbol\beta^{(t+1)}
=
\underset{\boldsymbol\beta}{ \operatorname{arg\,min} }
\sum_{i=1}^n w_i^{(t)} \left| y_i – X_i \boldsymbol\beta \right|^2
=
((W^{(t)} X)^{\rm T} (W^{(t)} X))^{-1} W^{(t)} \mathbf{y}$

with $W^{(t)}$ containing the square-roots of weights $w_i^{(t)}$?

or are these equivalent?

Best Answer

If we define the following vector/matrix variables $$\eqalign{ r &= X\beta - y \qquad&({\rm residual/error\,vector}) \\ R &= {\rm Diag}(r) \qquad&({\rm as\,a\,diagonal\,matrix}) \\ W &= {\rm Diag}(w) \qquad&({\rm weights\,as\,a\,matrix}) \\ }$$ then the objective function can be written without explicit summations as $$\eqalign{ \phi &= W:R^2 \\ }$$ where the colon is a convenient product notation for the trace, i.e. $$A:B = {\rm Tr}(A^TB) = {\rm Tr}(B^TA) = B:A$$ Calculate the gradient of the function. $$\eqalign{ d\phi &= W:(2R\,dR) \\ &= 2WR:dR \\ &= 2\,{\rm diag}(WR):{\rm diag}(dR) \\ &= 2Wr:dr \\ &= 2W(X\beta - y):(X\,d\beta) \\ &= 2X^TW(X\beta - y):d\beta \\ \frac{\partial\phi}{\partial\beta} &= 2X^TW(X\beta - y) \\ }$$ Set the gradient to zero and solve for the optimal $\beta$. $$\eqalign{ X^TWX\beta &= X^TWy \\ \beta &= (X^TWX)^{-1}X^TWy \\ }$$ This confirms the Wikipedia result.

Best Answer

Related Solutions

Understanding an identity for least squares regression line gradient

Rewrite the numerator

Rewrite the denominator

Replace now

Ordinary Least Squares Estimate

Related Question