Tail Lower Bounds using Moment Generating Functions

characteristic-functionsdistribution-tailsfourier analysismoment-generating-functionsprobability

Given a random variable $X>0$ with Moment Generating Function $m(s)=E[e^{sX}]$ I'm interested in finding a lower bound
$$\Pr[X \ge t] \ge 1-\varepsilon(t),$$
where $t>E[X]$.

A classic technique for finding upper bounds for $\Pr[X \ge t]$ is using Markov's inequality with the Moment Generating Function:
$$\Pr[X \ge t] \le E[e^{sX}]e^{-st},$$
for $s\ge 0$.
Since $E[e^{s X}]\ge e^{s E[X]}$ (Jensen's inequality), this bound is useful whenever $t>E[X]$. (I'm assuming $X$ is a positive random variable, so $E[X]>0$.)

Standard techniques for lower bounds include the second-moment method, such as Cantelli's inequality and Paley-Zygmund. However, they only cover the case $t\in[0,E[X]]$ and say nothing about the case $t>E[X]$.

My best bet so far has been Lévy's and Gil-Pelaez's inversion formulas for the Characteristic Function $m(i s)$:

$$\begin{align}\Pr[X \ge t] &= 1-\frac{1} {2\pi} \lim_{S \to \infty} \int_{-S}^{S} \frac{1 – e^{-ist}} {is}\, m(is)\, ds.
\\\text{and}\quad
\Pr[X \ge t] &= \frac{1}{2} + \frac{1}{\pi}\int_0^\infty \frac{\operatorname{Im}[e^{-ist}m(i s)]}{s}\,ds.\end{align}$$

However, I don't have any good ideas for bounding those in a useful way. It certainly seems a lot harder than the Markov method for the upper bound.

Am I missing a useful approach here?

Best Answer

Actually this is not so hard to do with Paley-Zygmund. Not sure why I didn't see this before. We just substitute $X$ by $e^{s X}$ like in Chernoff's bound:

$$ \Pr[e^{sX} > \theta E[e^{sX}]] = \Pr[X > \log(\theta E[e^{sX}])/s] \ge (1-\theta)^2\frac{E[e^{s X}]}{E[e^{2 s X}]}. $$

For an example with the normal distribution, where $E[e^{s X}]=e^{s^2/2}$ we get $$ \Pr[X > \log(\theta)/s + s/2] \ge (1-\theta)^2 e^{-\tfrac{3}{2}s^2}, $$ so taking $\log(\theta)/s+s/2=t$ and $\theta=1/2$ we get $$ \Pr[X > t] \ge \frac{1}{4} e^{-\left(\sqrt{t^2+\log (4)}+t\right)^2} \sim e^{-6t^2}. $$ Indeed, the value of $\theta$ matters very little. We can compare this result to the true asymptotics for the normal distribution $ \Pr[X > t] \sim e^{-t^2/2} $ to see that we lose roughly a factor 12 in the exponent. Alternatively for $t\to0$ we get $\Pr[X>0]\ge1/16$.

We can improve the lower bound a bit to $\sim t^{O(1)} e^{-2t^2}$ using the $L^p$ version of Paley-Zygmund, but there's still a gap. This differs from the situation of the upper bound (Markov/Chernoff) which is tight in the exponent. If there's a way to get a tight exponent using moment generating functions, I'm still very interested.

Edit: Just to clarify what I mean by $L^p$ PZ, it is the following inequality: $$ \operatorname{P}( Z > \theta \operatorname{E}[Z \mid Z > 0] ) \ge \left(\frac{(1-\theta)^{p} \, \operatorname{E}[Z]^{p}}{\operatorname{E}[Z^p]}\right)^{1/(p-1)}. $$ Using the same substitution as before, and defining $m(s)=E[e^{s X}]$, we get: $$ \Pr[X > \log(\theta m(s))/s] \ge \left((1-\theta)^{p}\frac{m(s)^p}{m(s p)}\right)^{1/(p-1)}. $$ In the limit $p\to 1$ we can Taylor expand as $$ \left(\frac{m(s)^p}{m(s p)}\right)^{1/(p-1)} = m(s)\,\exp\left(- \frac{s m'(s)}{m(s)} + O(p-1)\right), $$ which is nice, as $m(s)\exp(- \frac{s m'(s)}{m(s)}) = e^{-s^2/2}$ for the normal distribution. However, this lower bound is only for $X\ge \log(\theta)/s+s/2$, not $X \ge s$. So even if we let $s\to\infty$ and ignore the $(1-\theta)^{p/(p-1)}$ factor, we still only get the lower bound $\Pr[X\ge t] \ge e^{-t}$. So we still don't even get the exponential factor right.

More thoughts: It's interesting that Chernoff gives you (define $\kappa(s)=\log Ee^{s X}$) $$ \Pr[X \ge t] \le \exp(\kappa(s) - s t), $$ while PZ gives you (modulo some constants related to $\theta$), $$ \Pr[X \ge \kappa(s)] \ge \exp(\kappa(s) - s \kappa'(s)), $$ by the arguments above. For Chernoff, the optimal choice of $s$ is st. $\kappa'(s)=t$. For PZ we need $\kappa(s)=t$. So they meet for distributions where $\kappa'(s)=\kappa(s)$, meaning $Ee^{sX}=e^{e^s}$. The Poisson distribution is roughly an example of this.

Update: Using Cramér's method

Usually, Chernoff is proven sharp using Cramér's theorem. Cramér considers a sum of IID rvs., but I thought I should really see what happens in the general case.

Define $m(t)=E[e^{t X}]$ and $q_t(x) = e^{x t} m(t)^{-1} p(x)$ \begin{align*} \Pr[X > s] &= \int_s^\infty p(x) \\&= m(t) \int_s^\infty e^{-t x} q_t(x) \\&\ge m(t) \int_s^{s+\varepsilon} e^{-t x} q_t(x) \\&\ge m(t) e^{-t (s+\varepsilon)} \int_s^{s+\varepsilon} q_t(x). \end{align*} We set $t\ge 0$ such that $E[Q]=\kappa'(t)=s+\varepsilon/2$. Then by Chebyshev's inequality, \begin{align*} \int_s^{s+\varepsilon} q_t(x) &= \Pr[|Q-\mu| \le \varepsilon/2] \\&\ge 1-\kappa''(t)/(\varepsilon/2)^2. \end{align*} We set $\varepsilon=2\sqrt{2\kappa''(t)}$. Putting it together, we have proven the lower bound \begin{align*} \Pr[X > s] &\ge \tfrac12 \exp\left(\kappa(t) - (s+\varepsilon)t\right), \end{align*} where $t=\kappa'^{-1}(s+\varepsilon/2)$.

\subsection{Example} To understand the bound we are aiming for, let's do a Chernoff bound, \begin{align*} \Pr[X > s] =\Pr[e^{tX} > e^{ts}] \le e^{\kappa(t)-ts}, \end{align*} which we minimize by setting $\kappa'(t)=s$.

For $X$ normal distributed, we have $\kappa(t)=t^2/2$, $\kappa'(t)=t$, and $\kappa''(t)=1$. So the upper bound is [ \Pr[X > s] < \exp(-s^2/2). ]

The lower bound from before gives us \begin{align*} \Pr[X > s] &\ge \tfrac12 \exp\left(t^2/2 - (s+\varepsilon)\right), \\&= \tfrac12 \exp\left(-s^2/2 - \varepsilon s - 3\varepsilon^2/8\right). \end{align*}

It would be nice to replace $\exp(-\varepsilon s)$ with something polynomial, like $1/s^2$, but at least we get the leading coefficient right.

Improving Cramér: The main loss we get is from $e^{t(s+\varepsilon)}$. Let's see if we can avoid it. \begin{align*} \Pr[X > s] &= \int_s^\infty p(x) \\&= e^{\kappa(t)-ts} \int_s^\infty e^{-(t-s) x} q_t(x) \\&\ge e^{\kappa(t)-ts} \frac{(\mu-s)^2}{\sigma^2 + (\mu-s)^2} e^{\frac{(\sigma^2+\mu(\mu-s))(s-t)}{\mu-s}}, \\&= e^{\kappa(t)-ts} \frac{(\kappa'(t)-s)^2}{\kappa''(t) + (\kappa'(t)-s)^2} e^{\frac{(\kappa''(t)+\kappa'(t)(\kappa'(t)-s))(s-t)}{\kappa'(t)-s}}, \end{align*} which you get from placing a quadratic $P(x)$, under and tangential to $\exp(-(t-s)x)$, with $P(s)=0$.

In the case of the normal distribution, this lower bound is \begin{align*} \exp(-t^2/2) \frac{e^{-1}(s-t)^2}{1+(s-t)^2}. \end{align*} If we let $t=s+1/s$, we get \begin{align*} e^{-s^2/2} \frac{e^{-2-1/(2s^2)}}{1+s^2} = e^{-s^2/2} \, O(1/s^2) \end{align*} So we managed to get rid of the $e^{-c s}$ factor! We know the true bound is $\Pr[X\ge s] \sim \exp(-s^2/2)\frac{s/\sqrt{2\pi}}{1+s^2}$, so this is very close.


Here is a figure of what I mean by placing a quadratic under the exponential: quadratic

Related Question