I don't undertand why small errorr in $A^TA$ can lead to large error in cofficient matrix? Because A=QR, so there should be no difference to use A or QR anyway.Could someone give an example? Thank you very much
[Math] QR factorization for least squares
least squareslinear algebraprojective-space
Related Solutions
It seems that the mistake is to forget to inverse the matrix.
To make it understandable, a molre general example is shown below. Then, it is shown in the case of the quadratic function.
Note that $[M]^{-1}$ means the inverse of the matrix, not $\frac{1}{[M]}$.
@Claude Leibovici leads us to the normal equations using the calculus of minimization. Another approach based on linear algebra follows.
We are given the system matrix $A\in\mathcal{C}^{m\times n}$ and a data vector $b\in\mathcal{C}^{m}$. The goal is to find the solution vector $x\in\mathcal{C}^{n}$. If the vector $b$ is not in the image space of $A$, there is no direct solution to $Ax=b$. In other words, $b\in\mathcal{R}(A)\oplus\mathcal{N}(A^{*})$, has a null space component.
We can transform the problem to cure this malady. By multiplying both sides of the equation by $A^{*}$, we form the normal equations $$ A^{*} A x = A^{*}b. $$ The new data vector $\beta = A^{*}b\in\mathcal{C}^{n}$, and the new solution vector $\eta=A^{*}x\in\mathcal{C}^{m}$. The new system is $$ A^{*} \eta = \beta. $$
By this deliberate construction, the data vector $\beta$ is in the image of $A^{*}$. That is, we can express $\beta$ as a combination of the columns of $A^{*}$. The prescription is given in terms of $A^{*}_{k}, $the column vectors of $A^{*}$: $$ \beta = \eta_{1} A^{*}_{1} + \eta_{2} A^{*}_{2} + \cdots + \eta_{n} A^{*}_{n} $$
Now demonstrate that the normal equations solution is also the least squares solution. The idea is to show the normal equations solution minimizes the sum of the squares of the residuals given by $$ r^{2} = \min_{x\in\mathcal{C}^{n}}\lVert Ax - b \rVert_{2}^{2}. $$
Let $x_{*}$ be the solution to the normal equations: $$ x_{*} = \{x\in\mathcal{C}^{n} | A^{*}A x - A^{*} b = 0 \}. $$
The task is to evaluate the residuals and the trick is to fashion a useful zero vector inside the norm: $$ \begin{align} % \lVert A x - b \rVert_{2}^{2} % & = \lVert A x + \underbrace{Ax_{*} - Ax_{*}}_{0}- b \rVert_{2}^{2} \\ & = \lVert (A x_{*} - b) + A(x - x_{*}) \rVert_{2}^{2} \\ & = \lVert Ax_{*} - b \rVert_{2}^{2} + \lVert A(x - x_{*}) \rVert_{2}^{2} - 2(x - x_{*})^{*} \underbrace{(A^{*} A x_{*} - A^{*} b)}_{0} \\ & = \lVert Ax_{*} - b \rVert_{2}^{2} + \lVert A(x - x_{*}) \rVert_{2}^{2} \end{align} $$ The conclusion is the sum of the squares of the residuals are minimized when $x = x_{*}$. Therefore, the solution to the normal equations is also the least squares solution.
Best Answer
A typical example uses the matrix $$ \mathbf{A} = \left( \begin{array}{cc} 1 & 1 \\ 0 & \epsilon \\ \end{array} \right) $$
Consider the linear system $$ \mathbf{A} x= b. $$ The solution via normal equations is $$ % \begin{align} % x_{LS} &= \left( \mathbf{A}^{*}\mathbf{A} \right)^{-1}\mathbf{A}^{*}b \\ % x \left( \begin{array}{cc} x_{1} \\ x_{2} \end{array} \right)_{LS} % &= % % \left( \begin{array}{cc} b_{1}-\frac{b_{2}}{\epsilon }\\\frac{b_{2}}{\epsilon } \end{array} \right) % \end{align} % $$
Minute changes in $\epsilon$, for example, $0.001\to0.00001$ create large changes in the solution: $$ \epsilon = 0.001: \quad x_{LS} = \left( \begin{array}{cc} b_{1}-1000b_{2} \\ 1000 b_{2} \end{array} \right) $$ $$ \epsilon = 0.00001: \quad x_{LS} = \left( \begin{array}{cc} b_{1}-100000b_{2} \\ 100000 b_{2} \end{array} \right) $$
The $\mathbf{QR}$ decomposition is $$ \mathbf{A} = \mathbf{QR} = \left( \begin{array}{cc} 1 & 0 \\ 0 & \frac{\epsilon }{\left| \epsilon \right| } \\ \end{array} \right) % \left( \begin{array}{cc} 1 & 1 \\ 0 & \left| \epsilon \right| \\ \end{array} \right) $$