The probability for inner product for two unit sphere uniformly distributed random vectors $|u\cdot v|<\alpha$

probabilitystatistics

Following this question Why do we have that $u\cdot v$ converges weakly to a standard Gaussian random variables as $n\to \infty$?.

Let $u$ and $v$ be two random vectors on $R^n$ that are independent and uniformly distributed on the unit sphere. That means we can represent it as Gaussian random vectors $g\sim N(0, I_n)$, $$u=\frac{g}{\|g\|_2}.
$$

Here we know that $u\cdot v$ is a Gaussian with mean 0 and width $n^{-1/2}$.

Let $X=u\cdot v$ which is a Gaussian with mean $0$ and width $n^{-1/2}$.

Why do we have $$P(|X|\le \alpha)=\alpha \sqrt{n}$$ for $0<\alpha<n^{-1/2}$? Does this result coincide with the concentration inequality of the Gaussian $X$?


I can just get the asymptotic bound that means $u\cdot v=O_p(n^{-1/2})$. For non-asymptotic upper bound, we have this answer: Can we get the concentration inequality of the inner product of two unit vectors distributed on the sphere?
$$
P(|u\cdot v-E[u\cdot v]|\ge t)\le e^{-t^2/2}
$$

Since we have $E[u]=-E[u]$, then $E[u\cdot v]=E[(-u)\cdot v]$. Hence, $E[u\cdot v]=0$.

So $P(|u\cdot v|\ge t)\le e^{-t^2/2}$.

But how to show that $P(|u\cdot v|\le t)=t\sqrt{n}$?

Best Answer

As discussed, the question here relates to this question. Fix $n\in\mathbb Z_{>0}$. Let $X_1,X_2\sim\mathcal N(0,I_n)$ be independent standard normal, let $Z_i=\|X_i\|_2$ and $X^\circ_i=\frac{1}{Z_i}X_i$ for $i\in\{1,2\}$. We know the following distributions.

  1. The $X_{1,1},\dots,X_{1,n},X_{2,1},\dots,X_{2,n}$ are i.i.d. standard normal.
  2. This suggests that $Z_i^2=\sum_{j=1}^nX_{i,j}^2$ is Chi-squared with $n$ degrees of freedom.
  3. The inner product $X_1\cdot X_2=\sum_{j=1}^nX_{1,j}X_{2,j}=X^{\mathrm{T}}MX$ follows a generalized Chi-squared distribution, where $X=(X_{1,1},\dots,X_{1,n},X_{2,1},\dots,X_{2,n})\sim\mathcal N(0,I_{2n})$ (due to independence) and $M=\begin{pmatrix}0 & I_n\\ 0 & 0\end{pmatrix}$.
  4. The normalized vectors $X^\circ_1,X^\circ_2$ are i.i.d. uniform on the unit sphere.

What we do not know, is the distribution of $X^\circ_1\cdot X^\circ_2$, in particular it is not Gaussian (remember that $|X^\circ_1\cdot X^\circ_2|\le 1$ almost surely, this is certainly not normally distributed). So, what do we know about $X^\circ_1\cdot X^\circ_2$?

  1. This post suggests that $\mathbb P(|X^\circ_1\cdot X^\circ_2|\ge\sqrt{\ln(n)/n})\le a\exp(-b\ln(n))=an^{-b}=o(1)$, meaning that $|X^\circ_1\cdot X^\circ_2|$ converges to $0$ in probability. Let $V=\sqrt{n}X^\circ_1\cdot X^\circ_2\in[-\sqrt{n},\sqrt{n}]$, and $\delta(n)=o(1)$. Then this result also suggests that $\lim_{n\rightarrow 0}\mathbb P(\delta(n)V\ge\varepsilon)=0$ for all $\varepsilon$, so $\delta(n)V$ converges to $0$ in probability. One would say that $c(n)=\sqrt{n}$ is the appropriate scaling for $X^\circ_1\cdot X^\circ_2$. If you choose a scaling constant $c(n)=o(\sqrt{n})$, then the limiting distribution is trivial, meaning a one-point mass on $0$. If you choose $c(n)=\omega(\sqrt{n})$, then the limiting distribution is not defined (one could show that the limit of any bounded set would have probability $0$ in the limit).
  2. This post suggests that $V$ converges to $W\sim\mathcal N(0,1)$ in distribution. So, what we do not know is the distribution of $V$. What we do know, is that for every measurable $\mathcal E\subseteq\mathbb R$ we have $\lim_{n\rightarrow\infty}\mathbb P(V\in\mathcal E)=\mathbb P(W\in\mathcal E)$.

Since we do not know the distribution of $V$ and $X^\circ_1\cdot X^\circ_2$, there is little hope to obtain probabilities of events, say $\mathbb P(|X^\circ_1\cdot X^\circ_2|\le\alpha)$. What we have understood, however, are the asymptotics. I mean, we have just said that we know that $\lim_{n\rightarrow\infty}\mathbb P(|V|\le\alpha)=\mathbb P(|W|\le\alpha)$, for all $\alpha\in\mathbb R_{\ge 0}$. At this point, we really have to pay attention.

  1. Taking limits: We know that if we fix $\alpha$ and then take the limit, we have $\lim_{n\rightarrow\infty}\mathbb P(|V|\le\alpha)=\mathbb P(|W|\le\alpha)$. This means that $\alpha$ cannot depend on $n$.
  2. We have to remember the scaling. This means we have $\lim_{n\rightarrow\infty}\mathbb P(|X^\circ_1\cdot X^\circ_2|\le\frac{\alpha}{\sqrt{n}})=\mathbb P(|W|\le\alpha)$.

The answer in this post addresses a random variable $Y\sim\mathcal N(0,1/n)$. How does this fit here?

  1. We have $\sqrt{n}Y\sim\mathcal N(0,1)$ and hence $\sqrt{n}Y$ and $W$ have the same law.
  2. One might be tempted to compare $X^\circ_1\cdot X^\circ_2$ and $Y$, since $V=\sqrt{n}X^\circ_1\cdot X^\circ_2$ converges in distribution to $\sqrt{n}Y$. However, convergence in distribution is very weak (hence the name weak convergence), and we need to achieve significantly more to justify a comparison of $X^\circ_1\cdot X^\circ_2$ and $Y$.

This explains that we cannot (yet reliably) say that $X=u\cdot v$ has the law $\mathcal N(0,1/n)$, not even in an approximate sense. If you want to take a closer look at the finite size behavior of $X^\circ_1\cdot X^\circ_2$, I'd be happy to, but maybe in a new question. In the remainder of this answer I will focus on $Y\sim\mathcal N(0,1/n)$ and $\mathbb P(|Y|\le\alpha)$, since this was the actual question.

So, let $\alpha\in\mathbb R$. Recall from above that we have $\sqrt{n}Y\sim\mathcal N(0,1)$, so $\sqrt{n}Y$ and $W$ have the same law and we can write $\mathbb P(|Y|\le\alpha)=\mathbb P(|W|\le\sqrt{n}\alpha)$. The question is why we have $\mathbb P(|W|\le\sqrt{n}\alpha)=\sqrt{n}\alpha$, the answer is: we don't. We have $$\mathbb P(|W|\le\sqrt{n}\alpha)=\int_{-\sqrt{n}\alpha}^{\sqrt{n}\alpha}\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}\mathrm{d}x.$$ Some people would write $\mathbb P(|W|\le\sqrt{n}\alpha)=\Phi(\sqrt{n}\alpha)-\Phi(-\sqrt{n}\alpha)=1-2\Phi(-\sqrt{n}\alpha)$, where $\Phi$ denotes the cumulative distribution function of $\mathcal N(0,1)$. There's no better way to write this.

Now, let's turn to the claim. In fact, we can get somewhat decent approximations. Let's use the shorthand $\phi(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}$. As explained above, we want to know how $\mathbb P(|W|\le\sqrt{n}\alpha)$ and $\sqrt{n}\alpha$ are related for $0\le\alpha\le 1/\sqrt{n}$. So, let's do ourselves a favor first and just consider $\mathbb P(|W|\le a)$ and $a$ for $0\le a\le 1$, which is equivalent (to give some background, we look at the probability to be within an interval of at most one standard deviation from the mean, that's how it was intended in the answer in the first place). Using that $\phi$ has its unique maximum at $0$, we obtain the bounds $e^{-1/2}\sqrt{\frac{2}{\pi}}a=\frac{2e^{-1/2}a}{\sqrt{2\pi}}\le 2a\phi(1)\le 2a\phi(a)\le\mathbb P(|W|\le a)\le 2a\phi(0)=\frac{2a}{\sqrt{2\pi}}=\sqrt{\frac{2}{\pi}}a$. What does this tell us? It tells us that, strictly speaking, we have $\mathbb P(|W|\le a)\le \sqrt{\frac{2}{\pi}}a<a$, so the assumption $\mathbb P(|W|\le a)=a$ is wrong. But, not so strictly speaking, we have $\mathbb P(|W|\le a)=c(a)a$, where $e^{-1/2}\sqrt{\frac{2}{\pi}}\le c(a)\le \sqrt{\frac{2}{\pi}}$ is a coefficient depending on $a$, somewhere between $0.4839$ and $0.7979$, so the error is not catastrophic using this approximation. Since the argument in the answer was not formalized (and doesn't depend on constants, only on orders), this approximation is reasonable.

As I mentioned before, if you want to understand the error made by approximating $X^\circ_1\cdot X^\circ_2$ with $Y$ (respectively $\frac{1}{\sqrt{n}}W$), then please post a new question, I'd be happy to look into it.