[Math] Concentration of measure bounds for multivariate Gaussian distributions (fixed)

concentration-of-measuremeasure-theorynormal distributionprobabilityprobability distributions

Let $\gamma_n$ denote the standard Gaussian measure on $\mathbb{R}^n$. It is known (see for example Cor 2.3 here: http://www.math.lsa.umich.edu/~barvinok/total710.pdf) that
$$\gamma_n\{x\in\mathbb{R}^n: \|x\|^2 \ge \frac{n}{1-\epsilon}\}\ge e^{-\epsilon n/4}$$and
$$\gamma_n\{x\in\mathbb{R}^n: \|x\|^2 \le (1-\epsilon)n\}\le e^{-\epsilon n/4}.$$

Given a Gaussian distribution with covariance $\Sigma$, what can we say about the distribution of the norm? That is, what happens if rather than the standard Gaussian we consider the measure induced by $N(\vec{0},\Sigma)$, for some positive semidefinite $\Sigma$?

I think we should get similar bounds along the lines of
$$\gamma_n\{x\in\mathbb{R}^n: \|x\|^2 \ge \frac{\operatorname{Trace}(\Sigma)}{1-\epsilon}\}\ge e^{-\epsilon n/4}$$and
$$\gamma_n\{x\in\mathbb{R}^n: \|x\|^2 \le (1-\epsilon)\cdot\operatorname{Trace}(\Sigma)\}\le e^{-\epsilon n/4}.$$

I think it should be the trace since Tr$(\frac1n\sum x_i x_i^T) = \frac1n\sum \|x_i\|^2\to\mathbb{E} \|x\|^2$. Is this true? If so, is the derivation straightforward given the standard normal tail bound? It doesn't seem trivial, though often these things are simple modifications of the $N(0,I)$ case.

I believe one can reduce the problem to proving
$$\gamma_n\{x\in\mathbb{R}^n: x^t \Sigma^{-1} x \ge \frac{\operatorname{Trace}(\Sigma)}{1-\epsilon}\}\ge e^{-\epsilon n/4}$$and
$$\gamma_n\{x\in\mathbb{R}^n: x^t \Sigma^{-1} x \le (1-\epsilon)\cdot\operatorname{Trace}(\Sigma)\}\le e^{-\epsilon n/4},$$
but as the transformed variable leads to a factor of $|\det\Sigma|$, it seems like I'm on the wrong track.

Best Answer

Note: The following is not an answer, but merely some thoughts which might or might not be helpful to you.

First note that you confused(?) your inequality signs. I think you want $$ \gamma_{n}\left(\left\{ x\in\mathbb{R}^{n}\,\mid\,\left\Vert x\right\Vert ^{2}\geq\frac{n}{1-\varepsilon}\right\} \right){\color{red}\leq}e^{-\varepsilon n/4} $$ and $$ \gamma_{n}\left(\left\{ x\in\mathbb{R}^{n}\,\mid\,\left\Vert x\right\Vert ^{2}\geq\frac{{\rm Trace}\left(\Sigma\right)}{1-\varepsilon}\right\} \right){\color{red}\leq}e^{-\varepsilon n/4}. $$ Also note that this inequality would get better with larger values of $n$. But in general, this is not true. To see this, use e.g. $$ \Sigma=\left(\begin{matrix}1\\ & 0\\ & & \ddots\\ & & & 0 \end{matrix}\right), $$ or if you want your $\Sigma$ to be positive semidefinite, use $\frac{1}{L\left(n-1\right)}$ instead of the zeros on the diagonal, where $L$ is large. Your estimate would then imply (since $\left\Vert x\right\Vert ^{2}\geq\left|x_{1}\right|^{2}$) that $$ \mathbb{P}\left(\left|x_{1}\right|^{2}\geq\frac{1+\frac{1}{L}}{1-\varepsilon}\right)\leq\mathbb{P}\left(\left\Vert x\right\Vert ^{2}\geq\frac{1+\frac{1}{L}}{1-\varepsilon}\right)\leq e^{-\varepsilon n/4}\xrightarrow[n\to\infty]{}0, $$ which is absurd.

Hence, the (exponent of the) right hand side of your estimate somehow needs to involve ${\rm trace}\left(\Sigma\right)$ instead of $n$ (I think).



What follows is an adaptation of the argument you linked, but I get eventually stuck when I try to optimize the/find a good value of $\lambda$.

First, since $\Sigma$ is symmetric positive semidefinite, there is an orthogonal matrix $O\in\mathbb{R}^{n\times n}$ with $\Sigma=O \cdot {\rm diag}\left(\lambda_{1},\dots,\lambda_{n}\right)\cdot O^{T}$, where $\lambda_{1},\dots,\lambda_{n}\geq0$ are the eigenvalues of $\Sigma$. We can now define the square root $\sqrt{\Sigma}:=O\cdot {\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right) \cdot O^T\in\mathbb{R}^{n\times n}$ which satisfies $\sqrt{\Sigma}^{T}=\sqrt{\Sigma}$ and $\sqrt{\Sigma}\sqrt{\Sigma}=\Sigma$. Now, by well-known properties of the normal distribution, we conclude that $X:=\sqrt{\Sigma}g\sim N\left(0,\Sigma\right)$, where $g\sim N\left(0,{\rm id}\right)$ is a standard normal distributed random variable.

We also know that the standard normal distribution is invariant under orthogonal transformations, i.e. $h:=O^{T}g\sim N\left(0,{\rm id}\right)$. Finally, $$ \left\Vert X\right\Vert ^{2}=\left\Vert O{\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right)O^{T}g\right\Vert ^{2}=\left\Vert {\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right)h\right\Vert ^{2}=\sum_{i=1}^{n}\lambda_{i}h_{i}^{2}, $$ so that $\left\Vert X\right\Vert ^{2}$ has (as you noted yourself) expectation $$ \mathbb{E}\left\Vert X\right\Vert ^{2}=\sum_{i=1}^{n}\lambda_{i}\mathbb{E}h_{i}^{2}=\sum_{i=1}^{n}\lambda_{i}={\rm trace}\left(\Sigma\right), $$ since $\mathbb{E}h_{i}^{2}={\rm Var}\left(h_{i}\right)=1$, since $h\sim N\left(0,{\rm id}\right)$.

By reordering, we can assume $\lambda_{1}\geq\dots\geq\lambda_{j}>0=\lambda_{j+1}=\dots=\lambda_{n}$, where $j\in\left\{ 0,\dots,n\right\} $.

Now observe that the Markov/Chebyscheff inequality yields, for arbitrary $\lambda>0$, \begin{eqnarray*} \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right) & = & \mathbb{P}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\geq e^{\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\right)\\ & \leq & e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\mathbb{E}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\right), \end{eqnarray*} where \begin{eqnarray*} \mathbb{E}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\right) & = & \mathbb{E}\left(e^{\sum_{i=1}^{n}\lambda\lambda_{i}h_{i}^{2}}\right)\\ & = & \prod_{i=1}^{j}\mathbb{E}\left(e^{\lambda\lambda_{i}h_{i}^{2}}\right), \end{eqnarray*} by stochastic independence of $\left(h_{1},\dots,h_{n}\right)$. The main point of the introduction of the $e^{\dots}$ term is this final identity, where we can pull the product out of the expectation by independence.

Finally, \begin{eqnarray*} \mathbb{E}\left(e^{\gamma h_{i}^{2}}\right) & = & \frac{1}{\sqrt{2\pi}}\cdot\int_{\mathbb{R}}e^{\gamma x^{2}}\cdot e^{-x^{2}/2}\,{\rm d}x\\ & = & \frac{1}{\sqrt{2\pi}}\cdot\int_{\mathbb{R}}e^{-\left(\sqrt{\frac{1}{2}-\gamma}x\right)^{2}}\,{\rm d}x\\ & \overset{\omega=\sqrt{\frac{1}{2}-\gamma}x}{=} & \frac{1}{\sqrt{2\pi}\cdot\sqrt{\frac{1}{2}-\gamma}}\cdot\int_{\mathbb{R}}e^{-\omega^{2}}\,{\rm d}\omega\\ & = & \frac{1}{\sqrt{1-2\gamma}} \end{eqnarray*} for $\gamma<\frac{1}{2}$.

All in all, we arrive at $$ \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right)\leq e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\prod_{i=1}^{j}\frac{1}{\sqrt{1-2\lambda\lambda_{i}}}. $$ The problem is now to optimize this w.r.t. $0<\lambda<\frac{1}{2\lambda_{1}}$. One way to simplify(?) this is to use $$ e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\prod_{i=1}^{j}\frac{1}{\sqrt{1-2\lambda\lambda_{i}}}=e^{-\left[\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)-\frac{1}{2}\sum_{i=1}^{j}\ln\left(1-2\lambda\lambda_{i}\right)\right]}, $$ where one only has to optimize the exponent. Still, I neither see an easy way to determine the optimal value of $\lambda$, nor a really convenient choice of $\lambda$.

One choice inspired by your linked lecture notes is to use $\lambda=\frac{\delta/2}{{\rm trace}\left(\Sigma\right)+\delta}$ (because in the standard gaussian case, we have $n={\rm trace}\left(\Sigma\right)$, which is exactly the choice used in the lecture notes). This would yield \begin{eqnarray*} \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right) & \leq & e^{-\delta/2}\cdot\prod_{i=1}^{j}\sqrt{\frac{{\rm trace}\left(\Sigma\right)+\delta}{{\rm trace}\left(\Sigma\right)+\delta-\delta\lambda_{i}}}, \end{eqnarray*} which still does not really seem that great.

I will try to find a good choice of $\lambda$ here. If I come up with something, I will edit the post.