Solved – Sum of Products of Rademacher random variables

bernoulli-distributionprobabilityrandom variable

Let $x_1 \ldots x_a,y_1 \ldots y_b$ be independent random variables taking values $+1$ or $-1$ with probability 0.5 each. Consider the sum $S = \sum_{i,j} x_i\times y_j$. I wish to upper bound the probability $P(|S| > t)$. The best bound I have right now is $2e^{-\frac{ct}{\max(a,b)}}$ where $c$ is a universal constant. This is achieved by lower bounding the probability $Pr(|x_1 + \dots + x_n|<\sqrt{t})$ and $Pr(|y_1 + \dots + y_n|<\sqrt{t})$ by application of simple Chernoff bounds. Can I hope to get something that is significantly better than this bound? For starters can I at least get $e^{-c\frac{t}{\sqrt{ab}}}$. If I can get sub-gaussian tails that would probably be the best but can we expect that (I don't think so but can't think of an argument)?

Best Answer

The algebraic relation

$$S = \sum_{i,j} x_i y_j = \sum_i x_i \sum_j y_j$$

exhibits $S$ as the product of two independent sums. Because $(x_i+1)/2$ and $(y_j+1)/2$ are independent Bernoulli$(1/2)$ variates, $X=\sum_{i=1}^a x_i$ is a Binomial$(a, 1/2)$ variable which has been doubled and shifted. Therefore its mean is $0$ and its variance is $a$. Similarly $Y=\sum_{j=1}^b y_j$ has a mean of $0$ and variance of $b$. Let's standardize them right now by defining

$$X_a = \frac{1}{\sqrt a} \sum_{i=1}^a x_i,$$

whence

$$S = \sqrt{ab} X_a X_b = \sqrt{ab}Z_{ab}.$$

To a high (and quantifiable) degree of accuracy, as $a$ grows large $X_a$ approaches the standard Normal distribution. Let us therefore approximate $S$ as $\sqrt{ab}$ times the product of two standard normals.

The next step is to notice that

$$Z_{ab} = X_aX_b = \frac{1}{2}\left(\left(\frac{X_a+X_b}{\sqrt 2}\right)^2 - \left(\frac{X_a-X_b}{\sqrt 2}\right)^2 \right) = \frac{1}{2}\left(U^2 - V^2\right).$$

is a multiple of the difference of the squares of independent standard Normal variables $U$ and $V$. The distribution of $Z_{ab}$ can be computed analytically (by inverting the characteristic function): its pdf is proportional to the Bessel function of order zero, $K_0(|z|)/\pi$. Because this function has exponential tails, we conclude immediately that for large $a$ and $b$ and fixed $t$, there is no better approximation to ${\Pr}_{a,b}(S \gt t)$ than given in the question.

There remains some room for improvement when one (at least) of $a$ and $b$ is not large or at points in the tail of $S$ close to $\pm a b$. Direct calculations of the distribution of $S$ show a curved tapering off of the tail probabilities at points much larger than $\sqrt{ab}$, roughly beyond $\sqrt{ab\max(a,b)}$. These log-linear plots of the CDF of $S$ for various values of $a$ (given in the titles) and $b$ (ranging roughly over the same values as $a$, distinguished by color in each plot) show what's going on. For reference, the graph of the limiting $K_0$ distribution is shown in black. (Because $S$ is symmetric around $0$, $\Pr(S \gt t) = \Pr(-S \lt -t)$, so it suffices to look at the negative tail.)

Figures

As $b$ grows larger, the CDF grows closer to the reference line.

Characterizing and quantifying this curvature would require a finer analysis of the Normal approximation to Binomial variates.

The quality of the Bessel function approximation becomes clearer in these magnified portions (of the upper right corner of each plot). We're already pretty far out into the tails. Although the logarithmic vertical scale can hide substantial differences, clearly by the time $a$ has reached $500$ the approximation is good for $|S| \lt a\sqrt{b}$.

Insets


R Code to Calculate the Distribution of $S$

The following will take a few seconds to execute. (It computes several million probabilities for 36 combinations of $a$ and $b$.) On slower machines, omit the larger one or two values of a and b and increase the lower plotting limit from $10^{-300}$ to around $10^{-160}$.

s <- function(a, b) {
  # Returns the distribution of S as a vector indexed by its support.
  products <- factor(as.vector(outer(seq(-a, a, by=2), seq(-b, b, by=2))))
  probs <- as.vector(outer(dbinom(0:a, a, 1/2), dbinom(0:b, b, 1/2)))
  tapply(probs, products, sum)
}

par(mfrow=c(2,3))
b.vec <- c(51, 101, 149, 201, 299, 501)
cols <- terrain.colors(length(b.vec)+1)
for (a in c(50, 100, 150, 200, 300, 500)) {
  plot(c(-sqrt(a*max(b.vec)),0), c(10^(-300), 1), type="n", log="y", 
       xlab="S/sqrt(ab)", ylab="CDF", main=paste(a))
  curve(besselK(abs(x), 0)/pi, lwd=2, add=TRUE)
  for (j in 1:length(b.vec)) {
    b <- b.vec[j]
    x <- s(a,b)
    n <- as.numeric(names(x))
    k <- n <= 0
    y <- cumsum(x[k])
    lines(n[k]/sqrt(a*b), y, col=cols[j], lwd=2)
  }
}
Related Question