$\Pr[\sum_i X_i^2 Y_i^2\ge t]$, Chernoff bound for sum of pairs of squared Normal random variables

distribution-tailsnormal distributionprobabilityprobability distributions

I'm interested in finding tail bound for $\sum_{i=1}^k X_i^2 Y_i^2$, where $X_i$ and $Y_i$ are independent standard normal random variables.
It should be roughly as tight as the standard Chernoff bound, something like $e^{-\Omega(k\sqrt t)}$ would be nice.

My first instinct was to look at the mgf. $\exp(t X^2 Y^2)$, but naturally it doesn't exist. I looked at $\exp(i t X^2 Y^2) = e^{i/(8t)} K_0(i/(8t))/\sqrt{\pi i t}$, but I don't know how to get a tail bound using the characteristic function. I also considered moment bounds. We have $E(X^2 Y^2)^k=2^{2k}\Gamma(k+1/2)^2/\pi\le2(2k/e)^{2k}$, but to get a tail bound I need $E(\sum_i X_i^2 Y_i^2)^k$, which is of course a lot harder to estimate. I also considered the Cauchy Schwarz bound: $E(\sum_i X_i^2 Y_i^2)^k\le E(\sum_i X_i^4)^{k/2}(\sum_i Y_i^4)^{k/2}=E(\sum_i X_i^4)^{k}$, but even those moments seem pretty involved.

We have that $\Pr[\sum_{i=1}^kX_iY_i\ge tk]\le \exp\left(\frac{-t^2k}{2+t}\right)$ by Chernoff bounds, which is close to gaussian at least for small $t$. Perhaps we might also expect that $\sum X^2_iY^2_i$ is close to Chi-Squared for small $t$?

Does anyone know if there is a standard bound for this sum? Or if one of my approaches might be workable?

Best Answer

I found that a good way to handle this sum is using Bernstein's inequality:

Let $X_1, \ldots, X_n$ be independent zero-mean random variables. Suppose that $|X_i|\leq M$ almost surely, for all $i$. Then, for all positive $t$,

$$\Pr \left (\sum_{i=1}^n X_i \ge t \right ) \leq \exp \left( -\frac{t^2/2}{\sum \mathbb{E} X^2+ Mt/3} \right).$$

We have $$ \Pr[X^2Y^2\ge t] \le E[(XY)^{2p}]t^{-p} \le 2(2p/e)^{2p}t^{-p} \le 2 e^{-\sqrt t}, $$ taking $p=\sqrt{t}/2$. Hence we can take a union bound over all $XY$ to get: $$ \begin{align} \Pr\left( \sum_k X_k^2Y_k^2\ge (1+\epsilon)k \right) &\le \exp \left( -\frac{k\epsilon^2/2}{9+ M\epsilon/3} \right) +2k \exp(-\sqrt M)\le2\delta \end{align} $$ When taking $M=\log^2((2 k)/\delta)$ and $$\begin{align}t &= \frac{18 \log \left(\frac{1}{\delta}\right)}{\epsilon^2}+\frac{2 \log \left(\frac{1}{\delta}\right) \log^2 \left(\frac{2 k}{d}\right)}{3 \epsilon} =O(\epsilon^{-2}\log1/\delta+\epsilon^{-1}\log^32k/\delta). \end{align}$$

This matches exactly what we would expect from the central limit theory in the first term, and nearly bound from a "single large term" in the second term.

The only remaining question is whether we can get rid of the third power, to just have $\epsilon^{-1}\log^2(k/\delta)$ in the second term. I don't know how to do that though.

Update

The previous approach lost a factor $\log1/\delta$ in $t$. This can be avoided by the following neat trick: Instead of using Bernstein's inequality, just use $(XY)^2\le M^{1/2}|XY|\le \sqrt M (X^2+Y^2)/2$.

That gives us $$\begin{align} \Pr[\sum_iX_i^2Y_i^2 \ge (1+\epsilon)n \mid X^2Y^2\le M] &\le \exp\left(\lambda n X^2Y^2\right)/\exp\left(\lambda(1+\epsilon)\right) \\&\le \exp\left(\lambda n \sqrt M (X^2+Y^2)/2\right)/\exp\left(\lambda(1+\epsilon)n\right) \\&= \frac{1}{1-\lambda n\sqrt M}/\exp\left(\lambda(1+\epsilon)n\right) \\&=\frac{(1+\epsilon)n}{\sqrt M}\exp\left(1 - \frac{(1+\epsilon)n}{\sqrt M}\right) \\&=\frac{(1+\epsilon)n}{\log(2n/\delta)} \exp\left(1 - \frac{(1+\epsilon)n}{\log(2n/\delta)} \right) %\\&=\sqrt{(1+\epsilon)n}\exp\left(1 - \sqrt{(1+\epsilon)n}\right). \end{align}$$

Taking $\lambda=((1+\epsilon)n-M)/(M (1+\epsilon)n)$ and $M=(1+\epsilon)n$. The union bound over from the single elements is then $+2n\exp(-\sqrt{(1+\epsilon)n})$.

We see that taking $n\approx\epsilon^{-2}\log1/\delta + \epsilon^{-1}(\log1/\delta)^2$ now suffices.

The method is nearly as versatile as the Bernstein approach, and it gives the optimal values.

Update 2

Perhaps a more general approach is to use moments instead of exponential generating functions. Define $\|X\|_p = E[|X|^p]^{1/p}$ and note Khintchine's inequality $\|X\|_p \le C \sqrt{p}$ for normal distribution random variables, where $C$ is a universal constant, like 3. Define $x\lesssim y$ to mean $\le$ up to a universal constant.

We now have $\|XY\|_p \le \|X\|_{2p}\|Y\|_{2p}$ by Cauchy Schwarz, and $\|X^2\|_p = E[X^{2p}]^{1/p}=\|X\|_{2p}^2$. From the triangle inequality we get $\|X-E[X]\|_p\le \|X\|_p + E[X]$. Hence $\|X^2Y^2-1\|_p\lesssim p^2+1\lesssim p^2$ (assuming $p\ge 1$.)

The final step is to use Latała's inequality for sums of random variables. In particular the version in Corollary 38 of our eventual paper:

Let $X_1,\dots$ be mean 0 and iid. and assume further $\|X_i\|_p \lesssim p^\alpha$, where $p\ge 2, \alpha\ge 1$. Then $\|\sum_{i=1}^n X_i\|_p \lesssim \max\{\sqrt{pn}, (n/p)^{1/p}p^\alpha\}$.

In this simplified version the constant hidden in $\sim$ may depend on $\alpha$, but that doesn't matter for our purposes.

The final piece is Markov's inequality with powers, and we get $$\begin{align} \Pr[\sum_{i=1}^n X_i^2Y_i^2 \ge (1+\epsilon)n] &\le E[(\sum_{i=1}^n (X_i^2Y_i^2-1))^p]/(\epsilon n)^p \\&\lesssim \left(\frac{\max\{\sqrt{np} , (n/p)^{1/p} p^2\}}{\epsilon n}\right)^p \\&\le \exp\left(-\frac{\epsilon^2n}{2e}\right) +\sqrt{n/\epsilon} \exp\left(1-\frac{2\sqrt{\epsilon n}}{e}\right), \end{align}$$ where the last inequality comes from taking either $p=\epsilon^2 n/e$ or $p=\sqrt{\epsilon n}/e$.

We see that this bound actually improves on the previous method by a factor $\sqrt{n/\epsilon}$, and we can again take $n\approx\epsilon^{-2}\log1/\delta + \epsilon^{-1}(\log1/\delta)^2$.

A more general result is in the paper we eventually wrote. The method using moments is very general, and the availability of Latała's inequality makes it perhaps more useful than the exponential method.

Related Question