Sampling from equal-norm tight frame (Vershynin exercise 5.6.6)

concentration-of-measureframe-theorymatricesmatrix-normsprobability theory

I am struggling with Exercise 5.6.6 from Vershynin's "High-Dimensional Probability":

Consider an equal-norm tight frame $(u_i)_{i=1}^{N}$ in $\mathbb{R}^n$. State and prove a result that shows that a random sample of $m \gtrsim n \log n$ elements of $(u_i)_i$ forms a frame with good frame bounds (i.e., as close to tight as one wants). The quality of the result should not depend on the frame size $N$.

As $(u_i)_i$ is a tight frame, we can assume by appropriate scaling that
$$\sum_{i=1}^{N} u_i u_i^{\intercal} = \mathbf{1}_{n\times n}.$$
Moreover, as the frame is equal-norm we have
$$n=\operatorname{trace}(\mathbf{1}_{n\times n}) = \sum_{i=1}^{N} \operatorname{trace}(u_i u_i^{\intercal}) = \sum_{i=1}^{N} \operatorname{trace}(u_i^{\intercal}u_i )=N\lVert u_i\rVert_2^2$$ and thus
$$\lVert u_i u_i^{\intercal} \rVert_{op} = \lVert u_i\rVert_2^2=\frac{n}{N}.$$
I understand the exercise now that if we sample $Y_k\sim \operatorname{Unif}(u_i u_i^{\intercal})$. Then for any $\varepsilon>0$ there is some $m\gtrsim n \log n$ such that

$$\lVert \sum_{k=1}^{m} Y_k \rVert_{op} \geq 1 – \varepsilon$$
with high enough probability. I suspected to apply a version of matrix Berstein's inequality (Theorem 5.4.1 in Verschynin). By defining $X_k = Y_k – \mathbb{E}[Y_k] = Y_k – \frac{1}{N}\mathbf{1}_{n\times n}$ and then using

$$\mathbb{P}\left[ \lVert \sum_{j=1}^{m}X_j\rVert_{op} \geq t \right] \leq 2n \exp\left( – \frac{t^2/2}{\sigma^2 + Kt/3} \right),$$
where $\sigma^2 = \lVert \sum_{j=1}^{m} \mathbb{E}[X_j^2] \rVert_{op}$. Yet, I get stuck and in particular I cannot avoid any dependence on $N$.

Best Answer

Hint

$\newcommand\ang[1]{\left\langle #1 \right \rangle} \newcommand\R{\mathbb R} \newcommand\e{\varepsilon} \newcommand\E{\mathbb E} \newcommand{\tr}{\operatorname{tr}}$ Let $X$ be a uniformly randomly selected frame vector. It follows that $\sqrt{N}\cdot X$ is isotropic, so that letting $\Sigma:=EXX^T$ be the covariance matrix of $X$, we have $\Sigma=\frac{1}N I_n$

Let $X_i$ be i.i.d. copies of $X$ for $i=1,\dots,m$, and let $\Sigma_m = \frac1m\sum_{i=1}^m X_i X_i^T$ be the sample covariance matrix. Then for any $x\in \mathbb \R^n$, \begin{align*} \sum_{i=1}^m \ang{X_i,x}^2 = \sum_{i=1}^m \ang{X_iX_i^Tx,x} =m\cdot\ang{\Sigma_mx,x} &=m\ang{(\Sigma_m-\Sigma)x,x}+m\ang{\Sigma x,x} \\&=\frac{m}N\ang{\frac{\Sigma_m-\Sigma}{\|\Sigma\|}x,x}+\frac{m}N\|x\|^2 \end{align*} Letting $$\e=\|\Sigma_m-\Sigma\|/\|\Sigma\|,$$ we have $$-\e\|x^2\|\le\ang{\frac{\Sigma_m-\Sigma}{\|\Sigma\|}x,x} \le \e\|x\|^2,$$ leading to the bounds $$ \frac{m}{N}(1-\e)\|x\|^2\le \sum_{i=1}^m \ang{X_i,x}^2 \le \frac{m}{N}(1+\e)\|x\|^2 $$ Therefore, if we can show that $\epsilon$ is small with high probability, then we will have found a tight frame. Since $\epsilon$ is related to the difference in operator norm between the sample covariance matrix and the true covariance matrix, Theorem $5.61$ is very helpful. Rather, the high-probability version of that theorem mentioned in Exercise $5.64$ is directly applicable. This says that with probability at least $1-2e^{-u}$, and given a $K$ for which $\|X\|_2^2\le K^2\E\|X\|_2^2$ almost surely, we have $$ \e=\|\Sigma_m-\Sigma\|/\|\Sigma\|\le C\sqrt{\frac{K^2 n (\log n + u)}{m}}+\frac{K^2 n (\log n + u)}{m} $$

Related Question