Linear Algebra – Deriving the Normal Equation for Least Squares Problems

least squareslinear algebra

I am studying for my linear algebra final and it was suggested to me that I learn how to derive the normal equation used in solving least squares problems. I have been looking in various places online but I haven't had any luck, probably because I don't know the topic well enough to put the notation my professor uses into what I find. Note: the notation for what I'm using is directly from my class and I don't have a textbook to reference because he doesn't use one.

Here is an example from my class and I'm trying to use it to generate a general example that I could use to derive the normal equation $A^TA\vec x=A^T\vec b$

Given this equation $N=a_0t_0+a_1t_1$, and a graph with 4 collected data points in the form $(t_i,n_i)$, I created this error function

$E_2\begin{bmatrix}a_1\\a_2\end{bmatrix}= (a_0+a_1t_1-n_1)^2+(a_0+a_1t_2-n_2)^2+(a_0+a_1t_3-n_3)+(a_0+a_1t_4-n_4)=e_i^2+e_2^2+e_3^2+e_4^2=\Vert e\Vert= \Vert A\vec x-\vec b\Vert $

The summation I have is:
$$\sum_{i=1}^4 e_i^2 = (a_0+a_1t_i-n_i)^2$$

Since this is for a specific case I know I need to find a general error function to work with in order to derive the normal equation, so I created this one:

$$E_2\begin{bmatrix}a_1\\a_2\\.\\.\\.\\a_n\end{bmatrix}=\sum_{i=1}^n e_i^2 = (a_0+a_1t_i+a_2t_i^2+…a_nt_i^ n-n_i)^2$$

I'm not even sure that I need this information to do my derivation. What do I need to do?

Best Answer

@Claude Leibovici leads us to the normal equations using the calculus of minimization. Another approach based on linear algebra follows.

We are given the system matrix $A\in\mathcal{C}^{m\times n}$ and a data vector $b\in\mathcal{C}^{m}$. The goal is to find the solution vector $x\in\mathcal{C}^{n}$. If the vector $b$ is not in the image space of $A$, there is no direct solution to $Ax=b$. In other words, $b\in\mathcal{R}(A)\oplus\mathcal{N}(A^{*})$, has a null space component.

We can transform the problem to cure this malady. By multiplying both sides of the equation by $A^{*}$, we form the normal equations $$ A^{*} A x = A^{*}b. $$ The new data vector $\beta = A^{*}b\in\mathcal{C}^{n}$, and the new solution vector $\eta=A^{*}x\in\mathcal{C}^{m}$. The new system is $$ A^{*} \eta = \beta. $$

By this deliberate construction, the data vector $\beta$ is in the image of $A^{*}$. That is, we can express $\beta$ as a combination of the columns of $A^{*}$. The prescription is given in terms of $A^{*}_{k}, $the column vectors of $A^{*}$: $$ \beta = \eta_{1} A^{*}_{1} + \eta_{2} A^{*}_{2} + \cdots + \eta_{n} A^{*}_{n} $$

Now demonstrate that the normal equations solution is also the least squares solution. The idea is to show the normal equations solution minimizes the sum of the squares of the residuals given by $$ r^{2} = \min_{x\in\mathcal{C}^{n}}\lVert Ax - b \rVert_{2}^{2}. $$

Let $x_{*}$ be the solution to the normal equations: $$ x_{*} = \{x\in\mathcal{C}^{n} | A^{*}A x - A^{*} b = 0 \}. $$

The task is to evaluate the residuals and the trick is to fashion a useful zero vector inside the norm: $$ \begin{align} % \lVert A x - b \rVert_{2}^{2} % & = \lVert A x + \underbrace{Ax_{*} - Ax_{*}}_{0}- b \rVert_{2}^{2} \\ & = \lVert (A x_{*} - b) + A(x - x_{*}) \rVert_{2}^{2} \\ & = \lVert Ax_{*} - b \rVert_{2}^{2} + \lVert A(x - x_{*}) \rVert_{2}^{2} - 2(x - x_{*})^{*} \underbrace{(A^{*} A x_{*} - A^{*} b)}_{0} \\ & = \lVert Ax_{*} - b \rVert_{2}^{2} + \lVert A(x - x_{*}) \rVert_{2}^{2} \end{align} $$ The conclusion is the sum of the squares of the residuals are minimized when $x = x_{*}$. Therefore, the solution to the normal equations is also the least squares solution.

Related Question