Concentration without first moment

concentration-of-measuremeasure-theoryprobabilityprobability theory

The weakest concentration inequality I know of is markov's inequality:

$$\mathbb P (X \geq t) \leq \frac{\mathbb E X}{t}$$

where $X$ is a nonnegative random variable with first moment $\mathbb E X$ and $t > 0$. Of course, this bound assumes that the first moment exists.

Is there a weaker and still useful concentration inequality than the markov inequality? Does it even make sense to talk about concentration when the first moment does not exist?

For example in the specific case of a cauchy, I know that the measure concentrates around $0$, but I don't know of a useful bound on the measure far away from this concentration point that applies to general random variables without finite first moments.

EDIT: naively, I could imagine the existence of finite "partial moments" implying weaker inequalities. By partial moments I mean: $\mathbb E X^p$ for $p < 1$ (although I dont think negative $p$ could give us concentration type properties).

Best Answer

On $\mathbb{R}$ (and Polish spaces in general) all probability measures are tight, so some kind concentration inequality always exists and can be derived directly by the tightness property. (See https://en.wikipedia.org/wiki/Tightness_of_measures)

I would also add that for any bounded function $f:\mathbb{R}\rightarrow \mathbb{R}$ and any ranom variable $X$ on $\mathbb{R}$ it always holds that $\mathbb{E}f(X) < \infty $. If $f$ is also strictly increasing then we can always use Markov's inequality to derive the concentration bound $$ P(X>f^{-1}(t))\leq \frac{\mathbb{E} f(X)}{t}.$$

Related Solutions

Tail Lower Bounds using Moment Generating Functions

Actually this is not so hard to do with Paley-Zygmund. Not sure why I didn't see this before. We just substitute $X$ by $e^{s X}$ like in Chernoff's bound:

$$ \Pr[e^{sX} > \theta E[e^{sX}]] = \Pr[X > \log(\theta E[e^{sX}])/s] \ge (1-\theta)^2\frac{E[e^{s X}]}{E[e^{2 s X}]}. $$

For an example with the normal distribution, where $E[e^{s X}]=e^{s^2/2}$ we get $$ \Pr[X > \log(\theta)/s + s/2] \ge (1-\theta)^2 e^{-\tfrac{3}{2}s^2}, $$ so taking $\log(\theta)/s+s/2=t$ and $\theta=1/2$ we get $$ \Pr[X > t] \ge \frac{1}{4} e^{-\left(\sqrt{t^2+\log (4)}+t\right)^2} \sim e^{-6t^2}. $$ Indeed, the value of $\theta$ matters very little. We can compare this result to the true asymptotics for the normal distribution $ \Pr[X > t] \sim e^{-t^2/2} $ to see that we lose roughly a factor 12 in the exponent. Alternatively for $t\to0$ we get $\Pr[X>0]\ge1/16$.

We can improve the lower bound a bit to $\sim t^{O(1)} e^{-2t^2}$ using the $L^p$ version of Paley-Zygmund, but there's still a gap. This differs from the situation of the upper bound (Markov/Chernoff) which is tight in the exponent. If there's a way to get a tight exponent using moment generating functions, I'm still very interested.

Edit: Just to clarify what I mean by $L^p$ PZ, it is the following inequality: $$ \operatorname{P}( Z > \theta \operatorname{E}[Z \mid Z > 0] ) \ge \left(\frac{(1-\theta)^{p} \, \operatorname{E}[Z]^{p}}{\operatorname{E}[Z^p]}\right)^{1/(p-1)}. $$ Using the same substitution as before, and defining $m(s)=E[e^{s X}]$, we get: $$ \Pr[X > \log(\theta m(s))/s] \ge \left((1-\theta)^{p}\frac{m(s)^p}{m(s p)}\right)^{1/(p-1)}. $$ In the limit $p\to 1$ we can Taylor expand as $$ \left(\frac{m(s)^p}{m(s p)}\right)^{1/(p-1)} = m(s)\,\exp\left(- \frac{s m'(s)}{m(s)} + O(p-1)\right), $$ which is nice, as $m(s)\exp(- \frac{s m'(s)}{m(s)}) = e^{-s^2/2}$ for the normal distribution. However, this lower bound is only for $X\ge \log(\theta)/s+s/2$, not $X \ge s$. So even if we let $s\to\infty$ and ignore the $(1-\theta)^{p/(p-1)}$ factor, we still only get the lower bound $\Pr[X\ge t] \ge e^{-t}$. So we still don't even get the exponential factor right.

More thoughts: It's interesting that Chernoff gives you (define $\kappa(s)=\log Ee^{s X}$) $$ \Pr[X \ge t] \le \exp(\kappa(s) - s t), $$ while PZ gives you (modulo some constants related to $\theta$), $$ \Pr[X \ge \kappa(s)] \ge \exp(\kappa(s) - s \kappa'(s)), $$ by the arguments above. For Chernoff, the optimal choice of $s$ is st. $\kappa'(s)=t$. For PZ we need $\kappa(s)=t$. So they meet for distributions where $\kappa'(s)=\kappa(s)$, meaning $Ee^{sX}=e^{e^s}$. The Poisson distribution is roughly an example of this.

Update: Using Cramér's method

Usually, Chernoff is proven sharp using Cramér's theorem. Cramér considers a sum of IID rvs., but I thought I should really see what happens in the general case.

Define $m(t)=E[e^{t X}]$ and $q_t(x) = e^{x t} m(t)^{-1} p(x)$ \begin{align*} \Pr[X > s] &= \int_s^\infty p(x) \\&= m(t) \int_s^\infty e^{-t x} q_t(x) \\&\ge m(t) \int_s^{s+\varepsilon} e^{-t x} q_t(x) \\&\ge m(t) e^{-t (s+\varepsilon)} \int_s^{s+\varepsilon} q_t(x). \end{align*} We set $t\ge 0$ such that $E[Q]=\kappa'(t)=s+\varepsilon/2$. Then by Chebyshev's inequality, \begin{align*} \int_s^{s+\varepsilon} q_t(x) &= \Pr[|Q-\mu| \le \varepsilon/2] \\&\ge 1-\kappa''(t)/(\varepsilon/2)^2. \end{align*} We set $\varepsilon=2\sqrt{2\kappa''(t)}$. Putting it together, we have proven the lower bound \begin{align*} \Pr[X > s] &\ge \tfrac12 \exp\left(\kappa(t) - (s+\varepsilon)t\right), \end{align*} where $t=\kappa'^{-1}(s+\varepsilon/2)$.

\subsection{Example} To understand the bound we are aiming for, let's do a Chernoff bound, \begin{align*} \Pr[X > s] =\Pr[e^{tX} > e^{ts}] \le e^{\kappa(t)-ts}, \end{align*} which we minimize by setting $\kappa'(t)=s$.

For $X$ normal distributed, we have $\kappa(t)=t^2/2$, $\kappa'(t)=t$, and $\kappa''(t)=1$. So the upper bound is [ \Pr[X > s] < \exp(-s^2/2). ]

The lower bound from before gives us \begin{align*} \Pr[X > s] &\ge \tfrac12 \exp\left(t^2/2 - (s+\varepsilon)\right), \\&= \tfrac12 \exp\left(-s^2/2 - \varepsilon s - 3\varepsilon^2/8\right). \end{align*}

It would be nice to replace $\exp(-\varepsilon s)$ with something polynomial, like $1/s^2$, but at least we get the leading coefficient right.

Improving Cramér: The main loss we get is from $e^{t(s+\varepsilon)}$. Let's see if we can avoid it. \begin{align*} \Pr[X > s] &= \int_s^\infty p(x) \\&= e^{\kappa(t)-ts} \int_s^\infty e^{-(t-s) x} q_t(x) \\&\ge e^{\kappa(t)-ts} \frac{(\mu-s)^2}{\sigma^2 + (\mu-s)^2} e^{\frac{(\sigma^2+\mu(\mu-s))(s-t)}{\mu-s}}, \\&= e^{\kappa(t)-ts} \frac{(\kappa'(t)-s)^2}{\kappa''(t) + (\kappa'(t)-s)^2} e^{\frac{(\kappa''(t)+\kappa'(t)(\kappa'(t)-s))(s-t)}{\kappa'(t)-s}}, \end{align*} which you get from placing a quadratic $P(x)$, under and tangential to $\exp(-(t-s)x)$, with $P(s)=0$.

In the case of the normal distribution, this lower bound is \begin{align*} \exp(-t^2/2) \frac{e^{-1}(s-t)^2}{1+(s-t)^2}. \end{align*} If we let $t=s+1/s$, we get \begin{align*} e^{-s^2/2} \frac{e^{-2-1/(2s^2)}}{1+s^2} = e^{-s^2/2} \, O(1/s^2) \end{align*} So we managed to get rid of the $e^{-c s}$ factor! We know the true bound is $\Pr[X\ge s] \sim \exp(-s^2/2)\frac{s/\sqrt{2\pi}}{1+s^2}$, so this is very close.

Here is a figure of what I mean by placing a quadratic under the exponential:

Concentration result from Delta method

You can get a bound from Markov's inequality, but that's about it. In particular, there is no hope of getting such a strong bound (which is called sub-Gaussian concentration) without much stronger assumptions than just two moments. To see why, suppose that X is a random variable for which $$\mathbb{P}(|X| > t) \leq e^{-ct^2}$$ holds for all $t>0$. Then, using the usual trick for computing moment estimates from tail estimates \begin{align*} \mathbb{E}[e^{\frac{c}{2}X^2}] &= 1 + \int_0^\infty c t e^{\frac{c}{2}t^2}\mathbb{P}(|X|>t)dt \\ &\leq 1+ \int_0^\infty c t e^{\frac{c}{2}t^2}e^{-ct^2} dt = 1+ \int_0^\infty c te^{-\frac{c}{2}t^2} dt <\infty. \end{align*}

Now, let $c>0$ be any number and suppose that the $X_i$ are i.i.d. Exponential(1) random variables. Note that these random variables are much nicer than typical random variables with 2 moments -- they have finite exponential moments for powers less than 1. Even with these nice random variables, this fails already when $n=2$: \begin{align*} \mathbb{E}[e^{\frac{c}{2} (S_2 - 1)^2}] &= \int_{0}^\infty \int_0^\infty e^{\frac{c}{2}\left[\frac{1}{2}(x^2+y^2) - (\frac{1}{2}(x+y))^2 -1 \right]^2} e^{-(x+y)}dxdy = \infty. \end{align*} The same computation works for all $n \geq 2$.

In general, concentration bounds are basically the same thing as moment bounds. You can't hope for much better concentration than you put in with your hypotheses. So what does the hypothesis of having two finite moments buy you in this case? By Markov's inequality, there is a bound of the form

$$ \mathbb{P}(|S_n - Var(X_1)|>t) \leq \frac{\mathbb{E}[|S_n - Var(X_1)|]}{t}. $$

I'll sketch how to show that you cannot significantly improve on this. The same argument as above shows that if there exists constants $c,C,\alpha>0$ so that for all $t>c$ $$ \mathbb{P}(|X|>t) \leq C t^{-\alpha} \implies \mathbb{E}[|X|^\beta] <\infty $$ whenever $\beta < \alpha$. So now let's consider for $\epsilon>0$ i.i.d. random variables with density \begin{align*} f_{X_i}(t) = \begin{cases} \frac{1}{2+\epsilon}t^{-(3+\epsilon)} & t>1 \\ 0 & t \leq 1 \end{cases} \end{align*} Call $\sigma^2 = Var(X_1)$. We can compute that \begin{align*} \mathbb{E}[|S_2-\sigma^2|^{1+\epsilon}] &= \int_1^\infty \int_1^\infty \left|\frac{1}{2}(x^2+y^2) - (\frac{x}{2}+\frac{y}{2})^2-\sigma^2\right|^{1+\epsilon}\frac{1}{x^{2+\epsilon}y^{2+\epsilon}} \frac{1}{(1+\epsilon)^2}dxdy \\ &= \int_1^\infty \int_1^\infty \left|\frac{(x-y)^2}{4} -\sigma^2\right|^{1+\epsilon}\frac{1}{x^{3+\epsilon}y^{3+\epsilon}} \frac{1}{(2+\epsilon)^2}dxdy =\infty. \end{align*} To see that this is infinite, note that the inner integral can be seen to be infinite by limit comparison with $$ \frac{x^{2(1+\epsilon)}}{x^{3+\epsilon}} = x^{-(1-\epsilon)}. $$ For $\alpha>1$, this rules out any bound of the form \begin{align*} \mathbb{P}(|S_n-Var(X_1)| > t) \leq \frac{C}{t^{\alpha}} \end{align*} holding for all sufficiently large $t$ under these hypotheses. More complicated examples can be constructed to rule out weaker bounds as well.

Let me close by commenting on a common point of confusion. Convergence in distribution tells you essentially nothing about the regularity of the pre-limit objects. For example, let $X$ and $Y$ be independent N(0,1) and standard Cauchy random variables respectively and set $Z_n = (1-e^{-n})X + e^{-n} Y.$ $Z_n$ converges exponentially fast to a $N(0,1)$ random variable surely (never mind in distribution), but it doesn't have a finite mean.

Best Answer

Related Solutions

Tail Lower Bounds using Moment Generating Functions

Concentration result from Delta method

Related Question