I've been reading Trefethen & Bau's book on Numerical Linear Algebra, and they have this one question whose answer does not entirely make sense to me. In particular, they imply that the SVD algorithm (the computation of the SVD, not the solution of $Ax = b$ by SVD) is not backwards stable. The suggestion is that this has to do with the fact that SVD maps from an $m\times n$ matrix into the space of triples of $m\times m$, $m\times n$, and $n\times n$ for $U$, $\Sigma$, and $V$. They have a comment, with regards to the outer product computation, that since that, too, maps from a smaller dimensional space into a larger one, the computation should not be expected to be backwards stable. At the same time, Householder triangularization ($QR$), is backwards stable, but this too maps from a smaller dimensional space into a larger dimensional space. Is Householder just an exceptional case, or is there more to this?
[Math] Backwards stability of QR vs SVD
linear algebranumerical linear algebra
Related Solutions
Ok, suppose that $A = U \Sigma V^{T}$ . First of all consider how the SVD is calculated. The construction of the SVD uses the following idea
We get the left singular vectors $U$ by computing the covariance matrix
$$ AA^{T} = U \Sigma V^{T}( U \Sigma V^{T})^{T} = U \Sigma V^{T} (V \Sigma^{T} U^{T}) $$
now note that $V^{T}V = I$ so we have
$$ U \Sigma V^{T} (V \Sigma^{T} U^{T}) = U \Sigma \Sigma^{T} U^{T} $$
note that $\Sigma$ is a diagonal matrix and $ \sigma_{jj}^{T} = \sigma_{jj}$ so
$$ U \Sigma \Sigma^{T} U^{T} = U \Sigma \Sigma U^{T}$$
now the singular values $\sigma_{i}^{2} = \lambda_{i}$ so you can write
$$ U \Lambda^{\frac{1}{2}} \Lambda^{\frac{1}{2}} U^{T} = U \Lambda U^{T} $$
Similarly, the right singular vectors $V$ are given as
$$ A^{T}A = (U \Sigma V^{T})^T (U \Sigma V^{T}) = V \Lambda V^{T} $$
Note that you can write this like
$$ V^{T} A^{T}A V = \begin{pmatrix} v_{1}^{T} \\ v_{2}^{T} \\ \vdots \\ v_{m}^{T} \end{pmatrix} \left( A^{T}A v_{1}, A^{T}A v_{2} , \cdots A^{T} A v_{m} \right) = \begin{pmatrix} v_{1}^{T} \\ v_{2}^{T} \\ \vdots \\ v_{m}^{T} \end{pmatrix} \left( \lambda_{1} v_{1}, A^{T}A v_{2} , \cdots A^{T} A v_{m} \right) = \begin{pmatrix} \lambda_{1} & \textrm{z}^{*} \\ 0 & \textrm{B} \end{pmatrix} $$
where by induction we have $ B = \hat{V}{S} \hat{V}^{T}$
$$ V^{T} A^{T}A V = \begin{pmatrix} \lambda_{1} & z^{*} \\ 0 & \hat{V}{S} \hat{V}^{T}\end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & \hat{V} \end{pmatrix} \begin{pmatrix} \lambda & z^{*} \\ 0 & S \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & \hat{V} \end{pmatrix}^{T} $$
We use the same idea for $\tilde{A} = \tilde{U}^{T} A \tilde{V}$
$$ \tilde{U}^{T} A \tilde{V} = \begin{pmatrix} \tilde{u}_{1}^{T} \\ \tilde{u}_{2}^{T} \\ \vdots \\ \tilde{u}_{m}^{T} \end{pmatrix} \left(A \tilde{v}_{1} , A \tilde{v}_{2} , \cdots A \tilde{v}_{m} \right) \\ = \begin{pmatrix} \tilde{u}_{1}^{T} \\ \tilde{u}_{2}^{T} \\ \vdots \\ \tilde{u}_{m}^{T} \end{pmatrix} \left(\sigma \tilde{v}_{1} , A \tilde{v}_{2} , \cdots A \tilde{v}_{m} \right) = \begin{pmatrix} \sigma_{1} & z^{*} \\ 0 & \hat{A} \end{pmatrix} $$
with $\hat{A} = \hat{U} \hat{\Sigma} \hat{V}^{T}$ we have
$$ \begin{pmatrix} 1 & 0 \\ 0 & \hat{U} \end{pmatrix} \begin{pmatrix} \sigma_{1} & z^{*} \\ 0 & \hat{A} \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & \hat{V} \end{pmatrix}^{T} $$
The point is to continue along the lower part of the diagonal and prove $z^{*}$ is equal to $0$.
$$ \bigg\| \begin{bmatrix} \sigma_{1} & z^{*} \\ 0 & \hat{A} \end{bmatrix} \begin{bmatrix} \sigma_{1} \\ z \end{bmatrix} \bigg\|_{2} \geq \sigma_{1}^{2} + z^{*} z = \left( \sigma_{1}^{2} + z^{*}z\right)^{\frac{1}{2}} \bigg\| \begin{bmatrix} \sigma_{1} \\ z \end{bmatrix} \bigg\|_{2} $$
Now, note we're working with $\| \tilde{A}\| = \| \tilde{U}^{T} A \tilde{V} \|$ and $\| \tilde{A}\|_{2} \geq \left( \sigma_{1} + z^{*}z \right)^{\frac{1}{2}} $. Since $\| \tilde{A}\| = \| \tilde{U}^{T} A \tilde{V} \|$ we know that $\|\tilde{A}\| = \|A \| = \sigma_{1}$ which forces $z^{*} = 0$
Once you have done this we then have
$$ A = \tilde{U} \tilde{A} \tilde{V}^{T} = \begin{pmatrix} 1 & 0 \\ 0 & \hat{U} \end{pmatrix} \begin{pmatrix} \sigma_{1} & 0 \\ 0 & \hat{A} \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & \hat{V} \end{pmatrix}^{T} $$
Since this gives the form for a general matrix $A \in \mathbb{R}^{n \times m}$ and we have that $\hat{A} \in \mathbb{R}^{ (n -1) \times (m-1)}$ we know it has a SVD just like that...
I'm pretty sure I've messed up the $\hat{A}, \tilde{A}$ somewhere along here..
To answer your first question, a rule of thumb for a linear system of equations is that if the unit roundoff is $10^{-a}$ and the condition number is $10^b$, then you can expect about $a-b$ digits of accuracy in your answer. This is because, for a backwards stable algorithm, the numerical errors made during your computation can be viewed as a relative perturbation of size $\approx 10^{-a}$ of the initial data. Then the forward error is smaller than the backwards error times condition number so the forward relative error for $x$ is $\approx 10^{b-a}$, leading to $\approx a-b$ correct digits.
A least squares problem has additional subtleties over a pure linear system. For a formal derivation, Trefethen and Bau explain how the $(\kappa(A))^2$ appears in Eqs. (18.13-18.16) of the book you quote. Is their explanation good enough or do you have additional questions?
Regarding the $(\kappa(A))^2$ term in the error bound, note that the $(\kappa(A))^2$ term is tamped by two additional factors. if $\eta$ is on the order of $\kappa(A)$ (which will "usually" happen if $y$ is "randomly chosen"), then $(\kappa(A))^2 / \eta \approx \kappa(A)$. Additionally, if the angle between $y$ and $b$ is small (that is, if $Ax$ is a good approximation for $b$), then $\tan\theta$ will be $\approx 0$ and $(\kappa(A))^2 \tan(\theta)/ \eta$ will be small or at least on the order of $\kappa(A)$. Thus, the $(\kappa(A))^2$ term in the error bound is usually not as bad as it seems, and the rule of the thumb from the first paragraph is usually valid for least squares problems as well.
For some vague intuition as to why we might not be surprised $(\kappa(A))^2$ pops up, remember that the least squares problem is mathematically equivalent to the normal equations $A^\top A x = A^\top b$, which has condition number $\kappa(A^\top A) = (\kappa(A))^2$. Normally, the least-squares problem is significantly better conditioned than the normal equations, but in the worst case (which as established in the previous paragraph is a somewhat special situation) the conditioning becomes the same.
Best Answer
Notice that there are restrictions on the output matrices. In QR factorization, Q(an orthogonal matrix) has dimension $\frac{n(n-1)}{2}$, R(upper triangular matrix) has dimension $\frac{n(n+1)}{2}$, the total dimension is $n^2$, the dimension of n by n matrices.