Gradient in linear regression with weights

linear algebralinear regression

From 3.3 in Pattern Recognition in Machine Learning, I am asked to obtain weights for a regression with a weighted square loss function.

That is, $E(w,x) = \sum_{j=1}^n r_j(y_j – x_j^Tw)^2$

where $r_j$ is the weight for example $j$. I'm trying to formulate this as a vector problem and take its gradient. If we let

$R =\begin{bmatrix}
r_1 & 0 & \dots & 0 & 0 \\
0 & r_2 & 0 & \dots &0 \\
0 & \dots & \dots & \ddots & 0 \\
0 & \dots 0 & \dots & 0 & r_n \end{bmatrix} $

be a diagonal matrix with weights on the diagonal, we can rewrite

$E(w,x) = (y-Xw)^TR(y-Xw)$.

Then $\nabla_w E(w,x) = 2RX^T(y-Xw)$

Solving for $w$, I don't get the desired answer of $X(X^TX)^{-1}\sqrt{R}y$

Where am I wrong?

Best Answer

The gradient is $-2X^\top R (y - Xw)$. (Note that $RX^\top$ does not make sense since $R$ is $n \times n$ and $X^\top$ is $p \times n$.)

Setting this equal to zero yields $\hat{w} = (X^\top R X)^{-1} X^\top R y$, which is the solution for weighted least squares. Not sure where your "desired solution" comes from.

Related Question