Statistics – Distribution of Squared Euclidean Norm of Gaussian Vector

normal distributionprobability distributionsstatistics

If $\mathbf{X} \sim \mathcal{N}_N(\mathbf{m}, \mathbf{C})$ is an $N$-dimensional gaussian vector, where $\mathbf{m} \in \mathbb{R}^{N}$ and $\mathbf{C} \in \mathbb{R}^{N \times N}$, what is the distribution of
$$
Y = \| \mathbf{X} \|^2
$$
where $\| \cdot \|$ denotes the $L_2$-norm (Euclidean norm) ?

It may be useful to know that the mean can be easily calculated via
$$
\mathbb{E}[ \| \mathbf{X} \|^2 ] = \mathbb{E}\left[\sum_{i=1}^N X_i^2 \right] = \sum_{i=1}^N \mathbb{E}[X_i^2] = \sum_{i=1}^N (\sigma^2_i + m_i^2) = \sum_{i=1}^N\sigma_i^2 + \sum_{i=1}^N m_i^2 = \mathrm{tr}(\mathbf{C}) + \| \mathbf{m} \|^2
$$
where $\mathrm{tr}(\cdot)$ denotes the trace of a matrix.

EDIT: Related question: link.

Best Answer

If $\mathbf{m}=0$ and $\mathbf{C}$ is the identity matrix, then $Y$ is (by definition) distributed according to a chi-squared distribution.

We can relax the assumption that $\mathbf{m}=0$ and obtain the non-central chi-squared distribution.

On the other hand, if we maintain the assumption that $\mathbf{m}=0$ but allow for general $\mathbf{C}$, we have the Wishart distribution.

Finally, for general $(\mathbf{m},\mathbf{C})$, Y has a generalised chi-squared distribution.

Related Solutions

[Math] Verification of convolution between gaussian and uniform distributions

The first formula is correct. This is a direct consequence of the definition of the convolution and the definition of the normal and uniform laws.

For the second formula, there are 2 missing factors: $$ \frac 1{\prod_{k=1}^N (b_k - a_k)} $$ which is the inverse of the total weight of the uniform distribution, and $$ \frac 1{(2\pi)^{N/2}} $$ to make sure the weight of the normal distribution is 1.

For the third question, if $C = \text{diag}(\sigma_1^2, \dots \sigma_N^2)$ then the integrand splits and becomes $$ \prod_{k=1}^N \int_{m_k + a_k}^{m_k + b_k} \frac 1{(b_k-a_k)\sqrt{2\pi\sigma_k^2}} \exp\left(-\frac {(v_k - t_k)^2}{2\sigma_k^2}\right) dt_k $$ so the coordinates remain independent.

[Math] Concentration of measure bounds for multivariate Gaussian distributions (fixed)

Note: The following is not an answer, but merely some thoughts which might or might not be helpful to you.

First note that you confused(?) your inequality signs. I think you want $$ \gamma_{n}\left(\left\{ x\in\mathbb{R}^{n}\,\mid\,\left\Vert x\right\Vert ^{2}\geq\frac{n}{1-\varepsilon}\right\} \right){\color{red}\leq}e^{-\varepsilon n/4} $$ and $$ \gamma_{n}\left(\left\{ x\in\mathbb{R}^{n}\,\mid\,\left\Vert x\right\Vert ^{2}\geq\frac{{\rm Trace}\left(\Sigma\right)}{1-\varepsilon}\right\} \right){\color{red}\leq}e^{-\varepsilon n/4}. $$ Also note that this inequality would get better with larger values of $n$. But in general, this is not true. To see this, use e.g. $$ \Sigma=\left(\begin{matrix}1\\ & 0\\ & & \ddots\\ & & & 0 \end{matrix}\right), $$ or if you want your $\Sigma$ to be positive semidefinite, use $\frac{1}{L\left(n-1\right)}$ instead of the zeros on the diagonal, where $L$ is large. Your estimate would then imply (since $\left\Vert x\right\Vert ^{2}\geq\left|x_{1}\right|^{2}$) that $$ \mathbb{P}\left(\left|x_{1}\right|^{2}\geq\frac{1+\frac{1}{L}}{1-\varepsilon}\right)\leq\mathbb{P}\left(\left\Vert x\right\Vert ^{2}\geq\frac{1+\frac{1}{L}}{1-\varepsilon}\right)\leq e^{-\varepsilon n/4}\xrightarrow[n\to\infty]{}0, $$ which is absurd.

Hence, the (exponent of the) right hand side of your estimate somehow needs to involve ${\rm trace}\left(\Sigma\right)$ instead of $n$ (I think).

What follows is an adaptation of the argument you linked, but I get eventually stuck when I try to optimize the/find a good value of $\lambda$.

First, since $\Sigma$ is symmetric positive semidefinite, there is an orthogonal matrix $O\in\mathbb{R}^{n\times n}$ with $\Sigma=O \cdot {\rm diag}\left(\lambda_{1},\dots,\lambda_{n}\right)\cdot O^{T}$, where $\lambda_{1},\dots,\lambda_{n}\geq0$ are the eigenvalues of $\Sigma$. We can now define the square root $\sqrt{\Sigma}:=O\cdot {\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right) \cdot O^T\in\mathbb{R}^{n\times n}$ which satisfies $\sqrt{\Sigma}^{T}=\sqrt{\Sigma}$ and $\sqrt{\Sigma}\sqrt{\Sigma}=\Sigma$. Now, by well-known properties of the normal distribution, we conclude that $X:=\sqrt{\Sigma}g\sim N\left(0,\Sigma\right)$, where $g\sim N\left(0,{\rm id}\right)$ is a standard normal distributed random variable.

We also know that the standard normal distribution is invariant under orthogonal transformations, i.e. $h:=O^{T}g\sim N\left(0,{\rm id}\right)$. Finally, $$ \left\Vert X\right\Vert ^{2}=\left\Vert O{\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right)O^{T}g\right\Vert ^{2}=\left\Vert {\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right)h\right\Vert ^{2}=\sum_{i=1}^{n}\lambda_{i}h_{i}^{2}, $$ so that $\left\Vert X\right\Vert ^{2}$ has (as you noted yourself) expectation $$ \mathbb{E}\left\Vert X\right\Vert ^{2}=\sum_{i=1}^{n}\lambda_{i}\mathbb{E}h_{i}^{2}=\sum_{i=1}^{n}\lambda_{i}={\rm trace}\left(\Sigma\right), $$ since $\mathbb{E}h_{i}^{2}={\rm Var}\left(h_{i}\right)=1$, since $h\sim N\left(0,{\rm id}\right)$.

By reordering, we can assume $\lambda_{1}\geq\dots\geq\lambda_{j}>0=\lambda_{j+1}=\dots=\lambda_{n}$, where $j\in\left\{ 0,\dots,n\right\} $.

Now observe that the Markov/Chebyscheff inequality yields, for arbitrary $\lambda>0$, \begin{eqnarray*} \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right) & = & \mathbb{P}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\geq e^{\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\right)\\ & \leq & e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\mathbb{E}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\right), \end{eqnarray*} where \begin{eqnarray*} \mathbb{E}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\right) & = & \mathbb{E}\left(e^{\sum_{i=1}^{n}\lambda\lambda_{i}h_{i}^{2}}\right)\\ & = & \prod_{i=1}^{j}\mathbb{E}\left(e^{\lambda\lambda_{i}h_{i}^{2}}\right), \end{eqnarray*} by stochastic independence of $\left(h_{1},\dots,h_{n}\right)$. The main point of the introduction of the $e^{\dots}$ term is this final identity, where we can pull the product out of the expectation by independence.

Finally, \begin{eqnarray*} \mathbb{E}\left(e^{\gamma h_{i}^{2}}\right) & = & \frac{1}{\sqrt{2\pi}}\cdot\int_{\mathbb{R}}e^{\gamma x^{2}}\cdot e^{-x^{2}/2}\,{\rm d}x\\ & = & \frac{1}{\sqrt{2\pi}}\cdot\int_{\mathbb{R}}e^{-\left(\sqrt{\frac{1}{2}-\gamma}x\right)^{2}}\,{\rm d}x\\ & \overset{\omega=\sqrt{\frac{1}{2}-\gamma}x}{=} & \frac{1}{\sqrt{2\pi}\cdot\sqrt{\frac{1}{2}-\gamma}}\cdot\int_{\mathbb{R}}e^{-\omega^{2}}\,{\rm d}\omega\\ & = & \frac{1}{\sqrt{1-2\gamma}} \end{eqnarray*} for $\gamma<\frac{1}{2}$.

All in all, we arrive at $$ \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right)\leq e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\prod_{i=1}^{j}\frac{1}{\sqrt{1-2\lambda\lambda_{i}}}. $$ The problem is now to optimize this w.r.t. $0<\lambda<\frac{1}{2\lambda_{1}}$. One way to simplify(?) this is to use $$ e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\prod_{i=1}^{j}\frac{1}{\sqrt{1-2\lambda\lambda_{i}}}=e^{-\left[\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)-\frac{1}{2}\sum_{i=1}^{j}\ln\left(1-2\lambda\lambda_{i}\right)\right]}, $$ where one only has to optimize the exponent. Still, I neither see an easy way to determine the optimal value of $\lambda$, nor a really convenient choice of $\lambda$.

One choice inspired by your linked lecture notes is to use $\lambda=\frac{\delta/2}{{\rm trace}\left(\Sigma\right)+\delta}$ (because in the standard gaussian case, we have $n={\rm trace}\left(\Sigma\right)$, which is exactly the choice used in the lecture notes). This would yield \begin{eqnarray*} \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right) & \leq & e^{-\delta/2}\cdot\prod_{i=1}^{j}\sqrt{\frac{{\rm trace}\left(\Sigma\right)+\delta}{{\rm trace}\left(\Sigma\right)+\delta-\delta\lambda_{i}}}, \end{eqnarray*} which still does not really seem that great.

I will try to find a good choice of $\lambda$ here. If I come up with something, I will edit the post.

Best Answer

Related Solutions

[Math] Verification of convolution between gaussian and uniform distributions

[Math] Concentration of measure bounds for multivariate Gaussian distributions (fixed)

Related Question