Probability – Concentration Inequality of Inner Product of Unit Vectors

analysisprobabilitystatistics

Let $u$ and $v$ be two random vectors on $R^n$ that are independent and uniformly distributed on the unit sphere. That means we can represent it as Gaussian random vectors $g\sim N(0, I_n)$, $$u=\frac{g}{\|g\|_2}.
$$

But do we have the concentration inequality of the $u\cdot v$? That looks like $$P(u\cdot v\le x)\ge 1-Ce^{-c x}$$

Best Answer

For given $n\in\mathbb Z_{>0}$ let $X_1,X_2\sim\mathcal N(0,I_n)$ be two $n$-variate independent standard normal random vectors. Further, let $X^\circ_i=\frac{1}{\|X_i\|_2}X_i$ for $i=1,2$ (which is well-defined since $\|X_i\|_2>0$ almost surely). Notice that $X^\circ_1$ and $X^\circ_2$ are independent uniformly distributed points on the unit sphere. Let $x\cdot y=\sum_{i=1}^nx_iy_i$ denote the inner product for $x,y\in\mathbb R^n$.

Claim: There exists $a,b\in\mathbb R_{>0}$ such that for all $n\in\mathbb Z_{>0}$ and all $t\in\mathbb R_{\ge 0}$ we have $$\mathbb P(|X^\circ_1\cdot X^\circ_2|\ge t)\le ae^{-bt^2n}.$$

Remark: This is sometimes referred to as sub-gaussian concentration.

Proof: First, recall the Cauchy-Schwarz inequality and notice that $|X^\circ_1\cdot X^\circ_2|\le 1$ almost surely. Next, for $i\in\{1,2\}$ and using $Z_i=\|X_i\|_2$ we show that for all $n$, $t$ we have $\mathbb P(Z_i^2\le(1-t)n)\le\exp(-\frac{1}{4}t^2n)$. Let $X_i=(X_{i,j})_{j=1,\dots,n}$ and recall that the random variables (rvs) $X_1,\dots,X_n$ are mutually independent (the offdiagonal entries in the covariance matrix $I_n$ are $0$) standard normal rvs. This shows that $Z_i^2=\sum_{j=1}^nX_{i,j}^2$ follows a Chi-squared distribution with $n$ degrees of freedom. Hence, the Laurent-Massart bound yields $\mathbb P(n-Z_i^2\ge 2\sqrt{nt})\le e^{-t}$, or equivalently $\mathbb P(Z_i^2\le(1-t)n)\le\exp(-\frac{1}{4}t^2n)$.

Now, we show that for all $n$, $t$ we have $P(n,t)=\mathbb P(|X_1\cdot X_2|\ge tn)\le 2\exp(-f(t)t^2n)$ with $$f(t)=f(t,\lambda(t))=\frac{\ln(1-\lambda(t)^2)}{2t^2}+\frac{\lambda(t)}{t}$$ for $t>0$ and $f(0)=f(0,\lambda(0))=\frac{1}{2}$, where $\lambda(t)=\frac{1}{2t}(\sqrt{1+4t}-1)$ and $\lambda(0)=0$. Notice that this holds for $t=0$ because $P(n,0)\le 1<2=e^{-f(0)0^2n}$, so let $t>0$. Recall the Chernoff bound, let $\lambda\in(0,1)$ and notice that $$P_+(n,t)=\mathbb P(X_1\cdot X_2\ge tn) =\mathbb P\left(e^{\lambda X_1\cdot X_2}\ge e^{\lambda tn}\right) \le\mathbb E[e^{\lambda X_1\cdot X_2}]e^{-\lambda tn}.$$ Now, notice that $X_1\cdot X_2=\sum_{j=1}^nX_{1,j}X_{2,j}$ is a sum of $n$ i.i.d. rvs, since $X_1$, $X_2$ are independent and hence $X_{1,1},\dots,X_{1,n},X_{2,1},\dots,X_{2,n}$ are mutually independent standard normal rvs by the above. Thus, we have $\mathbb E[e^{\lambda X_1\cdot X_2}]=\mathbb E[\prod_{j=1}^ne^{\lambda X_{1,j}X_{2,j}}]=\prod_{j=1}^n\mathbb E[e^{\lambda X_{1,j}X_{2,j}}]=\mathbb E[e^{\lambda X_{1,1}X_{2,1}}]^n$ and further $$\mathbb E[e^{\lambda X_{1,1}X_{2,1}}] =\int\int\frac{1}{2\pi}e^{-\frac{1}{2}x^2-\frac{1}{2}y^2}e^{\lambda xy}\mathrm{d}y\mathrm{d}x =\sigma\int\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x}{\sigma}\right)^2}\int\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}(y-\lambda x)^2}\mathrm{d}y\mathrm{d}x,$$ where $\sigma=\sqrt{\frac{1}{1-\lambda^2}}\in\mathbb R_{\ge 1}$. The inner integral is $1$ since the normal distribution is normalized, and then so is the outer, thus $\mathbb E[e^{\lambda X_{1,1}X_{2,1}}]=\sigma=\sqrt{\frac{1}{1-\lambda^2}}$ and thereby $$P_+(n,t)\le\sqrt{\frac{1}{1-\lambda^2}}^ne^{-\lambda tn}=e^{-f(t,\lambda)t^2n},$$ with $f(t,\lambda)$ from above (recall that $t>0$). Since this holds for any choice of $\lambda\in(0,1)$, we identify the optimal $\lambda$ by computing $$\frac{\partial f}{\partial\lambda}(t,\lambda)=\frac{-\lambda}{t^2(1-\lambda^2)}+\frac{1}{t},$$ which is positive for $\lambda\rightarrow 0$ since $t$ is, and tends to $-\infty$ for $\lambda\rightarrow\infty$, so the only root given by $\frac{\lambda}{1-\lambda^2}=t$, or equivalently by $\lambda=\lambda(t)\in\mathbb R_{>0}$ (using $\lambda>0$), is a maximizer of $f$. This gives $P_+(n,t)\le e^{-f(t)t^2n}$ and further $P(n,t)\le 2e^{-f(t)t^2n}$ using that $X_1\cdot X_2$ and $-X_1\cdot X_2$ have the same law (since this holds for $X_1$ and $X_1$, $X_2$ are independent). Aside, notice that we did not have to show that $\lambda(t)$ maximizes $f$, but of course we want to clarify why we made this choice.

Before we turn back to $|X^\circ_1\cdot X^\circ_2|$, let us take another brief look at $f(t)$. Using l'Hôpital's rule, notice that $\lambda(0)=0$. Further, notice that $\lambda'(t)=\frac{\sqrt{1+4t^2}-1}{2t^2\sqrt{1+4t^2}}$ and hence $\lambda'(0)=1$ by another application of l'Hôpital's rule. Using l'Hôpital's rule for $g(t)=\lambda(t)/t$ at $t=0$ hence yields the continuation $g(0)=1$. Combining this with l'Hôpital's rule for $f$ gives $$\lim_{t\rightarrow 0}f(t)=\lim_{t\rightarrow 0}\frac{-2\lambda(t)\lambda'(t)}{4t(1-\lambda(t)^2)}+1 =-\frac{1}{2}+1=f(0),$$ so $f$ is continuous. So, let $\varepsilon\in(0,1)$ be sufficiently small such that $f(t)(1-\varepsilon)^2\ge 1/4$ for $t\in[0,\varepsilon]$.

Now, consider the event $\mathcal E(n,t)=\{|X^\circ_1\cdot X^\circ_2|\ge t\}$. For $t\in[0,1)$ we can write this as $$\mathcal E(n,t)=\left\{\frac{|X_1\cdot X_2|}{Z_1Z_2}\ge\frac{(1-t)t}{\sqrt{1-t}\sqrt{1-t}}\right\},$$ which implies that on $\mathcal E(n,t)$ we have $Z_1\le\sqrt{1-t}\sqrt{n}$ or $Z_2\le\sqrt{1-t}\sqrt{n}$ or $|X_1\cdot X_2|\ge(1-t)tn$, since otherwise $\mathcal E(n,t)$ does not hold. Hence, the union bound and the above yield \begin{aligned} P^\circ(n,t)&\le\mathbb P(Z_1^2\le(1-t)n)+\mathbb P(Z_2^2\le(1-t)n)+\mathbb P(|X_1\cdot X_2|\ge(1-t)tn)\\ &\le 2e^{-\frac{1}{4}t^2n}+2e^{-f((1-t)t)((1-t)t)^2n}. \end{aligned} Now, for $t\in[\varepsilon,1]$ we use the bound \begin{aligned} P^\circ(n,t) &\le P^\circ(n,\varepsilon) \le 2e^{-\frac{1}{4}\varepsilon^2n}+2e^{-f((1-\varepsilon)\varepsilon)((1-\varepsilon)\varepsilon)^2n} \le 2e^{-\frac{1}{4}\varepsilon^2n}+2e^{-f((1-\varepsilon)\varepsilon)((1-\varepsilon)\varepsilon)^2n}\\ &\le 4e^{-\frac{1}{4}\varepsilon^2 n} \le 4e^{-\frac{1}{4}\varepsilon^2 t^2n}=ae^{-bt^2n},\,a=4,\,b=\varepsilon^2/4, \end{aligned} where we used that $t\le 1$. For $t>1$ we have $P(n,t)=0\le ae^{-bt^2n}$ by the Cauchy-Schwarz inequality. For $t\in[0,\varepsilon)$ we proceed analogously to the above to obtain \begin{aligned} P^\circ(n,t)\le 2e^{-\frac{1}{4}t^2n}+2e^{-f((1-t)t)((1-t)t)^2n} \le ae^{-b't^2n},\,b'=1/4\ge b. \end{aligned} This shows that in any case we have $P^\circ(n,t)\le ae^{-bt^2n}$.

Aftermath: Clearly, we can significantly improve the constant $b$ by analyzing $f$. A plot reveals that $f([0,1])=[f(1),0.5]$ with $f(1)\approx 0.377428$. Using the bound $f\ge f(1)$ and then optimizing the coefficient already gives $b=0.118$. Working with $f$ directly allows to further optimize the coefficient, if required.