First, correct a simple error
$$
\mathbf{A}^{*}\mathbf{A} =
\left[
\begin{array}{rr}
9 & -9 \\
-9 & 9 \\
\end{array}
\right].
$$
This system matrix $\mathbf{A}$ is a rank one matrix (column 2 = $-$column 1) with singular value decomposition
$$
\begin{align}
\mathbf{A}
&=
\mathbf{U}\, \Sigma \, \mathbf{V}^{*} \\
%
\left[
\begin{array}{rr}
1 & -1 \\
-2 & 2 \\
2 & -2 \\
\end{array}
\right]
&=
% U
\left[
\begin{array}{ccc}
\frac{1}{3}
\color{blue}{\left[
\begin{array}{r}
-1 \\
2 \\
-2
\end{array}
\right]}
&
\frac{1}{\sqrt{5}}
\color{red}{\left[
\begin{array}{r}
-2 \\
0 \\
1
\end{array}
\right]}
&
\frac{1}{3\sqrt{5}}
\color{red}{\left[
\begin{array}{r}
2 \\
5 \\
4
\end{array}
\right]}
%
\end{array}
\right]
% sigma
\left[
\begin{array}{cc}
3 \sqrt{2} & 0 \\
0 & 0 \\
0 & 0 \\
\end{array}
\right]
% V
\frac{1}{\sqrt{2}}
\left[
\begin{array}{rc}
\color{blue}{-1} & \color{blue}{1} \\
\color{red}{1} & \color{red}{1} \\
\end{array}
\right].
\end{align}
$$
Blue vectors are in range spaces, red vectors in null spaces.
The thin SVD uses the range space components only:
$$
\mathbf{A} =
% U
\frac{1}{3}
\color{blue}{\left[
\begin{array}{r}
-1 \\
2 \\
-2
\end{array}
\right]}
% S
\left( 3\sqrt{2} \right)
% V
\frac{1}{\sqrt{2}}
\left[
\begin{array}{rc}
\color{blue}{-1} & \color{blue}{1}
\end{array}
\right].
$$
You may benefit from this example:
SVD and the columns — I did this wrong but it seems that it still works, why?
Compute the singular value decomposition of a matrix $\mathbf{A}\in\mathbb{C}^{m\times n}_{\rho}$
$$
\mathbf{A} =
\mathbf{U} \, \Sigma \, \mathbf{V}^{*}
=
% U
\left[ \begin{array}{cc}
\color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}}
\end{array} \right]
% Sigma
\left[ \begin{array}{cc}
\mathbf{S}_{\rho\times \rho} & \mathbf{0} \\
\mathbf{0} & \mathbf{0}
\end{array} \right]
% V
\left[ \begin{array}{c}
\color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \\
\color{red}{\mathbf{V}_{\mathcal{N}}}^{*}
\end{array} \right] \\
\\
\tag{1}
$$
The beauty of the SVD is that it provides an orthonormal basis for the four fundamental subspace for a matrix $\mathbf{A}\in\mathbb{C}^{m\times n}$
$$
\begin{align}
%
\mathbf{C}^{n} =
\color{blue}{\mathcal{R} \left( \mathbf{A}^{*} \right)} \oplus
\color{red}{\mathcal{N} \left( \mathbf{A} \right)} \\
%
\mathbf{C}^{m} =
\color{blue}{\mathcal{R} \left( \mathbf{A} \right)} \oplus
\color{red} {\mathcal{N} \left( \mathbf{A}^{*} \right)}
%
\end{align}
$$
To compute the SVD,
- Resolve the domain by finding eigenvectors of $\mathbf{A}^{*}\mathbf{A}$. Outputs: matrix of singular values $\mathbf{S}$, $\color{blue}{\mathbf{V}_{\mathcal{R}}}$.
- Compute $\color{blue}{\mathbf{U}_{\mathcal{R}}}$ using $\mathbf{S}$ and $\color{blue}{\mathbf{V}_{\mathcal{R}}}$
1. Resolve $\ \color{blue}{\mathcal{R} \left( \mathbf{A}^{*} \right)}$
Step 1:
Compute product matrix
$$
%
\begin{align}
%
\mathbf{W} = \mathbf{A}^{T}\mathbf{A} =
%
\left[
\begin{array}{cr}
2 & -1 \\
2 & 1 \\
\end{array}
\right]
\left[
\begin{array}{rr}
2 & 2 \\
-1 & 1 \\
\end{array}
\right]
%
=
%
\left[
\begin{array}{cc}
5 & 3 \\
3 & 5 \\
\end{array}
\right]
%
\end{align}
%
$$
Step 2:
Compute eigenvalue spectrum $\lambda \left(\mathbf{W}\right)$
$$
\det \mathbf{W} = 16, \qquad \text{trace } \mathbf{W} = 10
$$
The characteristic polynomial is
$$
p(\lambda) = \lambda^{2} - \lambda \text{ trace } \mathbf{W} + \det \mathbf{W}
= \lambda ^2-10 \lambda +16 =
\left( \lambda - 8 \right) \left( \lambda - 2 \right)
$$
The roots of the $p(\lambda)$ are the eigenvalues of $\mathbf{W}$:
$$
\lambda \left(\mathbf{W}\right) = \left\{ 8, 2 \right\}
$$
Step 3:
Compute singular value spectrum $\sigma$
To obtain the singular values: form $\tilde{\lambda}$, a list arranged in decreasing order with $0$ values culled:
$$
\sigma = \sqrt{\tilde{\lambda}} = \left\{ 2\sqrt{2}, \sqrt{2} \right\}
$$
The singular values are the diagonal entries of the $\mathbf{S}$:
$$
\boxed{
\mathbf{S} = \sqrt{2}\left[
\begin{array}{cc}
2 & 0 \\
0 & 1 \\
\end{array}
\right]
}
$$
Step 4:
Compute eigenvectors of $\mathbf{W}$
Fundamental tool: eigenvalue equation
$$
\mathbf{W} v_{k} = \lambda_{k} v_{k}, \qquad k = 1, 2
$$
$k=1$:
$$
%
\begin{align}
%
\mathbf{W} v_{1} &= \lambda_{1} v_{1} \\
%
\left[
\begin{array}{cc}
5 & 3 \\
3 & 5 \\
\end{array}
\right]
%
\left[
\begin{array}{c}
x \\ y \\
\end{array}
\right]
%
&=
%
8
%
\left[
\begin{array}{c}
x \\ y \\
\end{array}
\right] \\[3pt]
% % %
\left[
\begin{array}{c}
5 x + 3 y \\ 3 x + 5 y \\
\end{array}
\right]
&=
\left[
\begin{array}{c}
8x \\ 8y \\
\end{array}
\right]\\[3pt]
% % %
\left[
\begin{array}{c}
x \\ y \\
\end{array}
\right]
&=
\left[
\begin{array}{c}
1 \\ 1 \\
\end{array}
\right]
%
\end{align}
%
$$
The normalized vector is the first column vector in $\color{blue}{\mathbf{V}_{\mathcal{R}}}$.
$$
\hat{v}_{1} = \frac{1}{\sqrt{2}}
\left[
\begin{array}{r}
1 \\ 1 \\
\end{array}
\right]
$$
$k=2$:
$$
%
\begin{align}
%
\mathbf{W} v_{2} &= \lambda_{2} v_{2} \\
%
\left[
\begin{array}{cc}
5 & 3 \\
3 & 5 \\
\end{array}
\right]
%
\left[
\begin{array}{c}
x \\ y \\
\end{array}
\right]
%
&=
%
2
%
\left[
\begin{array}{c}
x \\ y \\
\end{array}
\right] \\[3pt]
% % %
\left[
\begin{array}{c}
5 x + 3 y \\ 3 x + 5 y \\
\end{array}
\right]
&=
\left[
\begin{array}{c}
2x \\ 2y \\
\end{array}
\right]\\[3pt]
% % %
\left[
\begin{array}{c}
x \\ y \\
\end{array}
\right]
&=
\left[
\begin{array}{c}
-1 \\ 1 \\
\end{array}
\right]
%
\end{align}
%
$$
The normalized vector is the second column vector in $\color{blue}{\mathbf{V}_{\mathcal{R}}}$.
$$
\hat{v}_{2} = \frac{1}{\sqrt{2}}
\left[
\begin{array}{r}
-1 \\ 1 \\
\end{array}
\right]
$$
Assemble:
$$
\boxed{
\color{blue}{\mathbf{V}_{\mathcal{R}}} = \frac{1}{\sqrt{2}}
%
\left[
\begin{array}{cr}
1 & -1 \\
1 & 1 \\
\end{array}
\right]
}
$$
2. Resolve $\ \color{blue}{\mathcal{R} \left( \mathbf{A} \right)}$
Rearrange (1) to recover
$$
\color{blue}{\mathbf{U}_{\mathcal{R}}} = \mathbf{A} \color{blue}{\mathbf{V}_{\mathcal{R}}} \mathbf{S}^{-1}
$$
The power of the SVD is that is aligns the $\color{blue}{range}$
spaces and accounts for scale differences. This allows direct computation using equation (1):
$$
\begin{align}
\color{blue}{\mathbf{U}_{\mathcal{R}}} =
\mathbf{A} \color{blue}{\mathbf{V}_{\mathcal{R}}} \mathbf{S}^{-1}
%
&=
\left[
\begin{array}{rc}
2 & 2 \\
-1 & 1 \\
\end{array}
\right]
%
\frac{1}{\sqrt{2}}
\left[
\begin{array}{cr}
1 & -1 \\
1 & 1 \\
\end{array}
\right]
% Sinv
\left[
\begin{array}{cc}
\frac{1}{2 \sqrt{2}} & 0 \\
0 & \frac{1}{\sqrt{2}} \\
\end{array}
\right]
%
\end{align}
%
$$
At last,
$$
\boxed{
\color{blue}{\mathbf{U}_{\mathcal{R}}} =
\left[
\begin{array}{cc}
1 & 0 \\
0 & 1 \\
\end{array}
\right]
}
$$
Final answer
$$
\mathbf{A} =
\color{blue}{\mathbf{U}_{\mathcal{R}}}
\mathbf{S}
\color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} =
%
\left[
\begin{array}{cc}
1 & 0 \\
0 & 1 \\
\end{array}
\right]
%
\sqrt{2}
\left[
\begin{array}{cc}
2 & 0 \\
0 & 1 \\
\end{array}
\right]
\frac{1}{\sqrt{2}}
%
\left[
\begin{array}{cr}
1 & -1 \\
1 & 1 \\
\end{array}
\right]
%
%
$$
Best Answer
Here, the data points don't lie around a line passing through origin. Had it been the case, the LS fit solution would be $Ax=0$, and would have been given by the first (right) singular vector (i.e, the dominant eigenvector of the matrix $A^TA$), which would be the solution to the optimization problem for maximizing the projection $\underset{|v|=1}{max}|Av|=0$.
Still the first right singular vector is the dominant eigenvector of $A^TA$: $[0.2557118, 0.9667531]$, as you can find with
R
which can be directly found with SVD:
You get $A^TA = \begin{bmatrix}414 & 93\\ 93 & 741 \end{bmatrix}$ and the characteristic polynomial $\lambda^2-1155\lambda+298125=0$, solving you get the largest eigenvalue $\lambda=765.5991$ and Solving the linear system $\begin{bmatrix}414 & 93\\ 93 & 741 \end{bmatrix}\begin{bmatrix}x_1 \\ x_2\end{bmatrix}=765.5991\begin{bmatrix}x_1 \\ x_2\end{bmatrix}$, $\implies 93x_2=351.5991x_1$ and $93x_1=24.5991x_2 \implies \frac{x_1}{x_2}\approx 0.2645$, s.t. the corresponding eigenvector is $\begin{bmatrix}0.2645\\ 1\end{bmatrix}$, normalizing (i.e, dividing by $\sqrt{0.2645^2+1^2}=1.034389$), we get the dominant unit eigenvector as $\begin{bmatrix}\frac{0.2645}{1.034389}\\ \frac{1}{1.034389}\end{bmatrix}=\begin{bmatrix}0.25571\\ 0.96675\end{bmatrix}$, which agrees with computation using the numeric methods with
R
.Let's for instance consider a different set of points $(\frac{7}{10},\frac{7}{10}), (7,9), (2,\frac{9}{2}$), s.t., the matrix $A=\begin{bmatrix}7/10 & 7/10\\ 7&9 \\ 2&9/2 \end{bmatrix}$. Now if we want the best fit line through origin (without having an intercept) then the corresponding vector will lie in the nullspace of $A$, i.e., will be a solution of $Ax=0$, the least square solution can be approximately found with SVD of $A$, as shown below:
The following figure shows the best-fit line obtained with SVD:
This agrees with the linear regression least-square fit without intercept:
Now, let's say we have the points $(17,4), (-2,26), (11,7)$ instead, then in order to find the best fit line, we need to minimize $||Ax-b||_2^2$, which boils down to solving $Ax=b$, where $b \neq 0$. Here, we have $A=\begin{bmatrix}1 & 17 \\ 1 & -2 \\ 1 & 11 \end{bmatrix}$ and $b=\begin{bmatrix} 4 \\ 26 \\ 7 \end{bmatrix}$ and the least square solution is given by the normal equation (the psuedo inverse) $\hat{\beta}=(A^TA)^{-1}A^Ty$. Now, we can use SVD here too, with SVD of $A=U\Sigma V^T$, we have $\hat{\beta}=V\Sigma^{-1}U^Ty$, as shown below.
It matches with the LS solution obtained using normal equation: