Derive a Chernoff bound for the sum of integers independently chosen uniformly at random from $\{0, 1, 2\}$

distribution-tailsprobabilityupper-lower-bounds

The following problem is exercise 4.12 from the first edition of Probability and Computing by Mitzenmacher and Upfal.

Consider a collection $X_1,…X_n$ of $n$ independent integers chosen
uniformly from the set $\{0, 1, 2\}$. Let $X = \sum_{i=1}^n X_i$ and
$0 < \delta < 1$. Derive a Chernoff bound for $\Pr(X \geq (1+\delta)n)$ and
$\Pr(X \leq (1 – \delta)n)$.

My current solution to this problem involves defining a variable $Y = \sum_{i=1}^n Y_i$, where each $Y_i$ is an integer chosen independently and uniformly at random from $\{-1, 0, 1\}$. Then we have

$$X = \sum_{i=1}^n X_i = \sum_{i=1}^n (Y_i + 1) = \sum_{i=1}^n Y_i + n = Y + n$$

Now if we can produce a bound on $\Pr(X \geq n + a) = \Pr(Y \geq a)$ for $a > 0$, then by setting $a = n\delta$ we have a bound on $\Pr(X \geq n + n\delta) = \Pr(X \geq (1+\delta)n)$.

I was able to derive the bound $\Pr(Y \geq a) \leq e^{-a^2/2n}$ by essentially imitating the proof I knew for a Chernoff bound $\Pr(Z \geq a) \leq e^{-a^2/2n}$ on a slightly different random variable $Z = \sum_{i = 1}^n Z_i$, where each $Z_i$ is chosen independently and uniformly at random from $\{-1, 1\}$.

Substituting this bound for $a = n\delta$ gives:

$$\Pr(X \geq (1+\delta)n) \leq e^{-\delta^2n/2}$$

My Question: is it possible to derive a tighter Chernoff bound? If so, how?

Intuitively it makes sense that $\Pr(Y \geq a) \leq \Pr(Z \geq a)$: $Z$ doesn't have all those possible zeroes in its summation bringing its variance down. This makes me wonder: is there a tighter Chernoff bound on $Y$ that doesn't depend on this looser bound on $Z$, and if so, how do you derive it? Or is it possible to derive a tighter Chernoff bound directly on $X$ that doesn't rely on $Y$?

Best Answer

Let's imitate the proof of the Chernoff bound more closely. We have $$ \mathbb E[e^{tY}] = \prod_{i=1}^n \mathbb E[e^{t Y_i}] = \left(\frac{e^{-t} + 1 + e^t}{3}\right)^n. $$ Further, we have $\Pr[Y \ge \delta n] = \Pr[e^{tY} \ge e^{t \delta n}] \le \left(\frac{e^{-t} + 1 + e^t}{3}\right)^n e^{-t \delta n}.$

The Taylor series expansion of $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right)$ begins $\frac13 t^2 - \frac1{36}t^4 + \frac{13}{2640} t^6 \pm \cdots$. This leads us to hope that we have $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right) \le \frac{t^2}{3}$ for all $t$, and we do.

To prove this: write $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right)$ as $f(t^2)$ where $f(u) = \ln\left(\frac{1 + 2\cosh \sqrt u}{3}\right)$. Then $f'(0) = \frac13$ (as we expected from the earlier Taylor series) and \begin{align} f''(u) &= \frac{\left(\cosh \sqrt{u}+2\right)-\frac{\sinh \sqrt{u}}{\sqrt u} \left(2 \cosh \sqrt{u}+1\right)}{2 u \left(2 \cosh \sqrt{u}+1\right)^2} \\ &\le \frac{\left(\cosh \sqrt{u}+2\right)-1 \cdot \left(2 \cosh \sqrt{u}+1\right)}{2 u \left(2 \cosh \sqrt{u}+1\right)^2}\\ &= \frac{1 - \cosh \sqrt u}{2 u \left(2 \cosh \sqrt{u}+1\right)^2} \le 0 \end{align} for all $u \ge 0$, since $\cosh$ is always at least $1$ and the denominator is nonnegative. Therefore $f(u)$ is concave for nonnegative $u$ and the tangent line $f(0) + f'(0) u$ is above $f(u)$ when $u \ge 0$. Substituting $u = t^2$, we get: $$ f(u) \le f(0) + f'(0) u \implies f(u) \le \frac13 u \implies \ln\left(\frac{e^{-t} + 1 + e^t}{3}\right) \le \frac13 t^2. $$ Due to the Taylor series expansion, this is the best bound of its type: as $t \to 0$, we have $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right) = \frac13 t^2 + O(t^4)$, so for all $\epsilon>0$, given sufficiently small $t$ we get $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right) > (\frac13 - \epsilon)t^2$. So we cannot improve on the inequality used above - not if we want to get a bound of the form $e^{-C\delta^2n}$ at the end.

Replacing $\frac{e^{-t} + 1 + e^t}{3}$ by its upper bound $e^{t^2/3}$, we get $$ \Pr[Y \ge \delta n] \le e^{\frac13t^2n - t\delta n} $$ and now the best choice of $t$ is $t = \frac32 \delta$, which gives us $$ \Pr[Y \ge \delta n] \le e^{-\frac34 \delta^2 n}. $$ This is the best bound of this type we could come up with with this method: by the Taylor series argument, we have captured the behavior of the Markov-type upper bound as $\delta \to 0$. (We can't rule out, though, that there's a better bound we can prove that doesn't go through the same application of Markov's inequality.) For $\delta$ further from $0$, we might get also better behavior by not ignoring the higher-order terms in $t$; this happens for the ordinary Chernoff bound as well.

Related Question