Bishop – Pattern Recognition & Machine Learning, Exercise 1.2

linear algebramachine learningtransformation

I'm working on exercise 1.2 (Curve Fitting Problem) of Bishop's Pattern Recognition and Machine Learning book.

You should write the linear equations, satisfied by the coefficients, that minimize the regularized sum-of-squares error function $\tilde E(w) = \frac{1}{2}\sum_{n = 1}^N (y(x_n, w) – t_n)^2 + \frac{\lambda}{2}\|w\|^2$ with $y(x, w) = \sum_{j = 0}^M w_jx^j$ and $\|w\|^2 = w^Tw$ for given data $(x_n, t_n)$.

Similar to exercise 1.1, I started with the partial derivative for the weight $w_i$ with $A_{ij} = \sum_{n = 1}^N (x_n)^{i + j}$ and $T_i = \sum_{n = 1}^N (x_n)^it_n$ from the first exercise:
$$
\frac{\partial \tilde E}{\partial w_i}(0.5 \sum_{n = 1}^N (y(x_n, w) – t_n)^2 + \lambda / 2 \|w\|^2) \\
= \frac{\partial \tilde E}{\partial w_i}(0.5 \sum_{n = 1}^N (y(x_n, w) – t_n)^2) + \frac{\partial \tilde E}{\partial w_i}( \lambda / 2 \|w\|^2) \\
= \sum_{j = 0}^M A_{ij}w_j – T_i + \lambda / 2 \frac{\partial \tilde E}{\partial w_i}(\|w\|^2) \\
= \sum_{j = 0}^M A_{ij}w_j – T_i + \lambda / 2 \cdot 2w_i = \sum_{j = 0}^M A_{ij}w_j – T_i + \lambda w_i
$$

Now I could set this derivative to zero and get
$$
\sum_{j = 0}^M (A_{ij}w_j) + \lambda w_i = T_i
$$

And at this point I've no idea, how to get this equation in a form of a linear system for using for example the gauss algorithm.

Thanks for any help.

Best Answer

I think you are there. Maybe slightly rewriting your last equation helps:

$$ A_{i1} w_1 + A_{i2} w_2 + ... + (A_{ii} + \lambda) w_i + ... + A_{iM} w_M = T_i $$

and you have $M$ of these equations.

Related Question