Could someone please explain why in this Wiki page one says "we know that $\exists(k+1)$ dimension space $(v_1,v_2, \dots, v_n)$" ?
Linear Algebra – Proof of Eckart-Young-Mirsky Theorem
linear algebraproof-explanationsvd
Related Solutions
For $n=1$ there is no solution (the only matrix with minimal polynomial $1$ is the $0\times 0$ matrix). Otherwise there is a solution, but not the $T$ in your question, which has $T^{n-1}\neq0$. Instead, you need the kernel of $T$ to be $2$-dimensional, which you can achieve for instance by changing the definition to have $T(v_1)=0$. The matrix of $T$ on the basis $v_1,\ldots,v_n$ then becomes $$ \begin{pmatrix} 0&0&0&\ldots&0&0\\0&0&0&\ldots&0&0\\ 0&1&0&\ldots&0&0\\0&0&1&\ldots&0&0\\\vdots&\vdots&\ddots&\ddots&0&0\\ 0&0&0&\ldots&1&0\\ \end{pmatrix}. $$ Another (equivalent) soultion is to define $T(v_{n-2})=0$ instead. In either case there is a vector $v_i$ with $T^{n-2}(v_i)\neq0$, but $T^{n-1}=0$.
To see that you must have $\dim(\ker T)=2$, consider the sequence $\dim(\ker T^i)$ for $i=0,1,\ldots,n-1$; it is weakly increasing, starts with $0$ and must end with numbers $d,n$ where $d<n$ (since $T^{n-2}\neq0$). Moreover, as a general fact, the "derived" sequence of its increments is weakly decreasing. It then follows that this derived sequence is $2,1,1,\ldots,1$ and the original sequence $0,2,3,4,\ldots,n-1,n$, in particular $\dim(\ker T^1)=2$.
The fact that the derived sequence is weakly decreasing follows because $T$ induces a map $\ker(T^{i+2})/\ker(T^{i+1})\to\ker(T^{i+1})/\ker(T^i)$ that can be checked to be always injective.
Show that $\{v_1,v_2,\dots,v_n\}$ is a basis of a vector space iff a chain of subspaces is complete.
I am expanding on my original answer to help clarify the confusion in the discussion of the problem.
Assumptions. We have an arbitrary vector space $V$ (of unspecified dimension) over a field $F$ and a chain $\{0\}=V_{0}\subseteq V_{1}\subseteq \ldots \subseteq V_{n-1} \subseteq V_{n} = V$ of subspaces. We then pick vectors $v_{1}, \ldots, v_{n}$ such that $v_{i}\in V_{i}\setminus V_{i-1}$.
Note in particular that each $v_{i}$ is nonzero since $v_i$ is not in $V_0$.
Remark 1. $v_{1}, \ldots, v_{n}$ are linearly independent.
Proof. Suppose $\lambda_{1}v_{1}+ \ldots +\lambda_{n} v_n = 0$. If some $\lambda_{i}$ is nonzero then we may choose largest such $i$. But then $v_i$ is a linear combination of $v_{1}, \ldots , v_{i-1}$ contradicting our assumption that $v_{i}$ is not in $V_{i-1}$.
Main Claim. $\{v_{1}, \ldots, v_{n}\}$ is a basis iff the chain of subspaces is complete.
Proof.
$\Leftarrow$: Assume the chain is complete. By Remark 1 we only need to prove that $\{v_{1}, \ldots, v_{n}\}$ spans $V$. Note that for any $i$, $V_{i}$ has dimension $1$ over $V_{i-1}$. Since if not, then we can pick two vectors $v$ and $w$ in $V_{i}$ linearly independent over $V_{i-1}$. But then the space $W$ spanned by $V_{i-1}$ and $v$ is strictly between $V_{i}$ and $V_{i-1}$. So for any $i$, we have that $V_{i-1}$ is spanned by $V_{i-1}\cup\{v_{i}\}$ since $\{v_i\}$ must be a basis for $V_{i}$ over $V_{i-1}$ by the above conclusion. Then $V_{n}=V$ is spanned by $\{v_{1},\ldots v_{n}\}$ by induction. Another way to see this is to use the general formula $$ dim(V) = dim(V_{n}/V_{n-1}) + dim(V_{n-1}/V_{n-2}) + \ldots + dim(V_{1}/V_{0}) $$ We have shown that each summand on the right is $1$. So $dim(V) = n$ hence $\{v_{1}, \ldots , v_{n}\}$ is a basis.
$\Rightarrow$: Assume $\{v_{1}, \ldots, v_{n}\}$ is a basis. Then $dim(V)=n$ by Remark 1. Consider the same formula for $dim(V)$ in terms of the dimensions of the quotient spaces as above. The existence of the $v_{i}$'s ensures each $dim(V_{i}/V_{i-1})$ is at least $1$. So each of these is exactly $1$ since $dim(V)=n$. This forces the chain to be complete, since if $W$ is strictly between $V_{i-1}$ and $V_{i}$ then we would have $$ dim(V_{i}/V_{i-1}) = dim(V_{i}/W) + dim(W/V_{i-1})\geq 2 $$
Best Answer
One needs to show that if $\mathrm{rank}(B)=k$, then $\|A-B\|_2\geq\|A-A_k\|_2$. This can be done as follows.
Since $\mathrm{rank}(B)=k$, $\dim\mathcal{N}(B)=n-k$ and from $$\dim\mathcal{N}(B)+\dim\mathcal{R}(V_{k+1})=n-k+k+1=n+1$$ (where $V_{k+1}=[v_1,\ldots,v_{k+1}]$ is the matrix of right singular vectors associated with the first $k+1$ singular values in the descending order), we have that there exists an $$x\in\mathcal{N}(B)\cap\mathcal{R}(V_{k+1}), \quad \|x\|_2=1.$$ Hence $$ \|A-B\|_2^2\geq\|(A-B)x\|_2^2=\|Ax\|_2^2=\sum_{i=1}^{k+1}\sigma_i^2|v_i^*x|^2\geq\sigma_{k+1}^2\sum_{i=1}^{k+1}|v_i^*x|^2=\sigma_{k+1}^2. $$ From $\|A-A_k\|_2=\sigma_{k+1}$, one gets hence $\|A-B\|_2\geq\|A-A_k\|_2$. No contradiction required, Quite Easily Done.
EDIT An alternative proof, which works for both the spectral and Frobenius norms, is based on the Weyl's theorem for eigenvalues (or more precisely, its alternative for singular values): if $X$ and $Y$ are $m\times n$ ($m\geq n$) and (as above) the singular values are ordered in the descreasing order, we have $$\tag{1} \sigma_{i+j-1}(X+Y)\leq\sigma_i(X)+\sigma_j(Y) \quad\text{for $1\leq i,j\leq n, \; i+j-1\leq n$} $$ (this follows from the variational characterization of eigen/singular values; see, e.g., Theorem 3.3.16 here). If $B$ has rank $k$, $\sigma_{k+1}(B)=0$. Setting $j=k+1$, $Y:=B$, and $X:=A-B$, in (1) gives $$ \sigma_{i+k}(A)\leq\sigma_i(A-B) \quad \text{for $1\leq i\leq n-k$}. $$ For the spectral norm, it is sufficient to take $i=1$. For the Frobenius norm, this gives $$ \|A-B\|_F^2\geq\sum_{i=1}^{n-k}\sigma_i^2(A-B)\geq\sum_{i=k+1}^n\sigma_i^2(A) $$ with the equality attained, again, by $B=A_k$.