On minimum distance from one row vector to the linear span of the others

contest-mathinequalitylinear algebralinear-transformations

I am dealing with the test of the OBM (Brasilian Math Olympiad), University level, 2017, phase 2.

As I've said at other topics (questions 1, 2, 3, 4 and 6 – this last still open), I hope someone can help me to discuss this test.

The question 5 says:

Let $d\leq n$ be two positive integers and $A$ be an $d\times n$ real matrix that introduces a linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^d$ as $v\mapsto Av$. Let $\sigma(A)$ be the supremum of $\inf _{v\in W,\|v\|=1}\|Av\|$ over all $d$-dimensional subspaces $W$ of $\mathbb{R}^n$.

For each $j\leq d$, let be $r(j)\in\mathbb{R}^n$ be the $j$-th line vector of $A$, meaning that $r(j)=A^t e_j$, where $e_j$ is the $j$-th vector in the canonical basis of $\mathbb{R}^d$. Prove that
$$
\sigma(A)\leq \min_{i\leq d} \operatorname{dist}\left(r(i),\operatorname{span}\{r(j):\ j\neq i\}\right)\leq \sqrt{n}\cdot \sigma(A).
$$

I know that the distance between one vector $r(i)$ and the subspace is at most $|r(i)|$ and I tried some calculus, but not very substantial.

Thanks very much.

Edited – October, 11

By comment of @user10354138, I think that:

Let $k:=\operatorname {rank} (A)\lt d$. By the rank-nullity theorem, taking $A:W\subset \mathbb{R}^n\rightarrow \mathbb{R}^d$, I have $k+\operatorname {nullity}(A)=d$, so $\operatorname {nullity}(A)\geq1$ and I have a vector $v\in W-\{0\}$ such that $A\cdot v=0$. So, $\sigma(A)=0$. Moreover, some of the lines of $A$ is L.D. with the other lines, once $A$ don't have maximum rank. Then the central term of the inequality is $0$ as well and I have the statement equivalent to $0\leq0\leq0$, trivial.

About the second part, I don't know if I'll get… I know in this case the lines $r(i)$ are a base for a subspace of dimension $d$. Do you mean take $\mathbb{R}^n=W\oplus U$, with $\operatorname {dim}(U)=n-d$? I can take $A\cdot v=A\cdot (w\oplus u)=A\cdot w\oplus A\cdot u$ and $|A\cdot v|\geq |A\cdot w|$

Best Answer

We will prove $$ (1)\qquad\qquad\qquad\qquad\sigma(A)\le \min_{i\le d}\{\operatorname{dist}\left(r(i),\operatorname{span}\{r(j):\ j\neq i\}\right)\} \le \sqrt d\ \sigma(A). \qquad\qquad\qquad\qquad\qquad\qquad\qquad $$

We only treat the case $\ $ rank$(A)=d$. In that case we set $W_0=\langle r(i)\rangle_{i=1,\dots ,d}$. Moreover $K=$ Ker $(A)$ satisfies $K\bot W_0$ and $K\oplus W_0=\Bbb R^n$. So $\inf _{v\in W_0,\|v\|=1}\|Av\|>0$ and we first prove that $$ (2)\qquad\qquad\qquad\qquad\qquad\qquad\qquad \sigma(A)= \inf _{v\in W_0,\|v\|=1}\|Av\|.\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad $$

For this let $W$ be a vector space of dimension $d$. If $K\cap W\ne \{0\}$, then $\inf _{v\in W,\|v\|=1}\|Av\|=0$. Else $K\oplus W=\Bbb R^n$ and we find an isomorphism of vector spaces $\varphi:W\to W_0$ given by $\varphi(v)= v_0$, where $v=v_k+v_0$ with $v_k\in K$, $v_0\in W_0$. This induces a bijection $\widetilde \varphi:\partial B(0,1)\cap W\to \partial B(0,1)\cap W_0$ given by $\widetilde\varphi (v)=\frac{\varphi(v)}{\|\varphi(v)\|}$. Moreover $\|\varphi(v)\|\le \|v\|$, and so for any vector $v$ in $\partial B(0,1)\cap W$ we have $$ \|Av\|=\|Av_0\|=\|A(\varphi(v))\|\le\frac{\|A(\varphi(v))\|}{\|\varphi(v)\|}=\|A(\widetilde\varphi(v))\| $$ It follows that $$ \inf _{v\in W_0,\|v\|=1}\|Av\|\ge \inf _{v\in W,\|v\|=1}\|Av\| $$ for any vector space $W\subset \Bbb{R}^n$ of dimension $d$, and so $$ \sigma(A)= \inf _{v\in W_0,\|v\|=1}\|Av\|. $$

For each $j=1,\dots, d$ there exists a unique vector $s_j\in \Bbb R^n$ such that

  • $s_j\bot r(i)$ for $i\ne j$,

  • $s_j\in W_0$, or, equivalently, $s_j\bot K$,

  • $\langle s_j,r(j)\rangle=\|s_j\|^2$.

In fact, take any basis $k_1,\dots,k_{n-d}$ of $K$, and take the generalized cross product $t_j=k_1\times k_2\times \dots \times k_{n-d}\times r(1)\times\dots \times \widehat{r(j)} \times \dots \times r(d)$,

(where $\widehat{r(j)}$ means as usual that we delete $r(j)$), and then take $$ s_j=\text{proj}_{t_j}r(j)=\frac{\langle r(j),t_j\rangle}{\langle t_j,t_j\rangle} t_j. $$

Since $A(v)=\sum_{i=1}^d \langle r(j),v\rangle e_j$, where $\{e_j\}$ is the canonical basis of $\Bbb R^d$, we have $A(s_j)=\|s_j\|^2 e_j$. But then we have a basis $\{u_j\}_{j=1,\dots,d}$ of $W_0$ with $u_j=\frac{s_j}{\|s_j\|}$ and $A(u_j)=\|s_j\| e_j$.

On one hand we have $\operatorname{dist}\left(r(i),\operatorname{span}\{r(j):\ j\neq i\}\right)=\|s_i\|$, and so $$ (3)\qquad\qquad\qquad\qquad\qquad\qquad\qquad\min_{i\le d}\{\operatorname{dist}\left(r(i),\operatorname{span}\{r(j):\ j\neq i\}\right)\}= \min_{i\le d}\{\|s_i\|\}\qquad\qquad\qquad\qquad\qquad\qquad\qquad $$

On the other hand we can define $A^{-1}:\Bbb R^d \to W_0$ given by $A^{-1}(e_j)=\frac{1}{\|s_j\|}u_j$. Then we have $$(4)\qquad\qquad\qquad\qquad \sigma(A)= \inf _{v\in W_0,\|v\|=1}\|Av\|=\min _{v\in W_0,\|v\|=1}\|Av\|=\frac{1}{\displaystyle\max_{v\in \Bbb R^d,\|v\|=1}\|A^{-1}(v)\|}. \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad $$ Clearly $$ \max_{v\in \Bbb R^d,\|v\|=1}\|A^{-1}(v)\|\ge \max_{j\le d}\|A^{-1}(e_j)\|= \max_{j\le d}\frac{1}{\|s_j\|} $$ and so $$(5)\qquad\qquad\qquad\qquad\qquad\qquad\qquad \sigma(A)\le \min_{j\le d}\{\|s_j\|\}.\qquad\qquad\qquad\qquad\qquad\qquad\qquad $$ Finally consider the matrix $B$ which implements $A^{-1}$, that is, the $j$th row is the vector $\frac{1}{\|s_j\|}u_j\in \Bbb{R}^n$. Then $\max_{v\in \Bbb R^d,\|v\|=1}\|A^{-1}(v)\|=\|B\|_2$ is just the 2-operator norm of $B$, given by $\sqrt{\lambda_1}$, where $\lambda_1\ge \lambda_2\ge \dots \ge \lambda_d$ are the singular (positive) values of $B^*B$. In particular $$ \lambda_1+\dots+\lambda_d=Tr(B^*B)=\sum_{j=1}^d (B^*B)_{jj}. $$ But $$ (B^*B)_{jj}=\sum_{k=1}^n(B^*)_{jk}B_{kj}=\sum_{k=1}^n \overline{B}_{kj}B_{kj}=\sum_{k=1}^n|B_{kj}|^2=\|B(e_j)\|^2=\frac{1}{\|s_j\|^2}, $$ and so $$ \lambda_1\le Tr(B^*B) \le d\max_{j\le d}\{(B^*B)_{jj}\}=d\max_{j\le d}\frac{1}{\|s_j\|^2}, $$ hence $$ \max_{v\in \Bbb R^d,\|v\|=1}\|A^{-1}(v)\|=\sqrt{\lambda_1}\le \sqrt{d}\max_{j\le d}\frac{1}{\|s_j\|}. $$ It follows that $$ \min_{j\le d}\{\|s_j\|\}=\frac{1}{\displaystyle\max_{j\le d}\frac{1}{\|s_j\|}}\le \frac{\sqrt{d}}{\displaystyle\max_{v\in \Bbb R^d,\|v\|=1}\|A^{-1}(v)\|}=\sqrt{d}\ \sigma(A) $$

which together with $(3)$ and $(5)$ proves $(1)$.