SVD and best rank-$k$ approximation

linear algebra

I do not understand several parts of the proof in this answer: Proof of Eckart-Young-Mirsky theorem.

  1. How does the statement "we have that there exists an $x \in \mathcal{N}(B) \cap \mathcal{R}(V_{k+1}), \left\lVert x\right\rVert_{2} = 1$" follow from what has come before? The prior statements about the dimensions of $\mathcal{N}(B)$ and $\mathcal{R}(V_{k+1})$ do not seem to clearly indicate anything about their intersection.

  2. How do the equalities "$\left\lVert Ax\right\rVert_{2}^{2} = \sum_{i=1}^{k+1}\sigma_{i}^{2}\left\lvert v_{i}^{*}x\right\rvert^2$" and "$\sigma_{k+1}^{2}\sum_{i=1}^{k+1}\left\lvert v_{i}^{*}x\right\rvert^2 = \sigma_{k+1}^{2}$" follow from what has come before? What is $v_{i}^{*}$ and how does it become involved? And what are these sums? Why is the latter sum apparently equal to $1$?

  3. It says "from $\left\lVert A−A_k\right\rVert^2=\sigma_{k+1}$" – how is that result proved?

Thanks very much for assistance.

Best Answer

In general for two subspaces $U,W$ we have $\dim(U+W) = \dim(U) + \dim(W) - \dim(U \cap W)$. $N(B)$ and $R(V_{k+1})$ are subspaces of $\mathbb{R}^n$ whose dimensions sum to $n+1$, so $$\dim(\mathbb{R}^n) \ge \dim(N(B) + R(V_{k+1})) = \dim(N(B)) + \dim(R(V_{k+1}) - \dim(N(B) \cap R(V_{k+1})$$ implies $\dim(N(B) \cap R(V_{k+1})) \ge 1$.


The answerer is using the asterisk $*$ to denote conjugate transpose. If you are working only with real numbers, you can just think of it as the transpose $\top$.

It may be more helpful to just write out the SVD of $A$.

$$\|A^\top x\|_2^2 = x^\top A^\top A x = x^\top V \Sigma^\top U^\top U \Sigma V^\top x = x^\top V\Sigma^\top \Sigma V^\top x.$$

$V^\top x$ is a vector whose entries are $v_i^\top x$ for $i=1,\ldots,n$. Note further that because $x \in R(V_{k+1})$, it must be orthogonal to $v_i$ when $i > k+1$, due to orthogonality of $v_1,\ldots, v_n$. So the entries of $V^\top x$ are $v_1^\top x, v_2^\top x, \ldots, v_{k+1}^\top x, 0, \ldots, 0$.

$\Sigma^\top \Sigma$ is a $n \times n$ diagonal matrix with diagonal entries $\sigma_1^2, \sigma_2^2, \ldots$. Thinking about the matrix multiplication $(V^\top x)^\top (\Sigma^\top \Sigma) (V^\top x)$, you will see that this quantity can be written as $\sum_{i=1}^n \sigma_i^2 (v_i^\top x)^2$. Since the addend is zero for $i > k+1$, this equals $\sum_{i=1}^{k+1} \sigma_i^2 (v_i^\top x)^2$.

$\sum_{i=1}^{k+1} (v_i^\top x)^2$ equals $\sum_{i=1}^n (v_i^\top x)^2$ which can be written as $x^\top V V^\top x = x^\top x = \|x\|_2^2=1$.


By definition, $A-A_k = \sum_{i=k+1}^n \sigma_i u_iv_i^\top$, i.e. $A-A_k = U\tilde{\Sigma} V^\top$ where $\tilde{\Sigma}$ is obtained by changing the first $k$ singular values in $\Sigma$ to zero. Recalling that the operator norm $\|M\|_2$ is the largest singular value of $M$, we can simply note that the largest singular value of $A-A_k$ is $\sigma_{k+1}$ to conclude $\|A-A_k\|_2 = \sigma_{k+1}$.