[Math] Intuition behind Chebyshev’s inequality

inequalityintuitionprobability theory

Is there any intuition behind Chebyshev's inequality or is that only pure mathematics? What strikes me is that any random variable (whatever distribution it has) applies to that.

$$
\Pr(|X-\mu|\geq k\sigma) \leq \frac{1}{k^2}.
$$

Best Answer

The intuition is that if $g(x) \geq h(x) ~\forall x \in \mathbb R$, then $E[g(X)] \geq E[h(X)]$ for any random variable $X$ (for which these expectations exist). This is what one would intuitively expect: since $g(X)$ is always at least as large as $h(X)$, the average value of $g(X)$ must be at least as large as the average value of $h(X)$.

Now apply this intuition to the functions $$g(x) = (x-\mu)^2 ~ \text{and}~ h(x)= \begin{cases}a^2,& |x - \mu| \geq a,\\0, & |x-\mu|< a,\end{cases}$$ where $a > 0$ and where $X$ is a random variable with finite mean $\mu$ and finite variance $\sigma^2$. This gives $$E[(X-\mu)^2] = \sigma^2 \geq E[h(X)] = a^2P\{|X-\mu|\geq a\}.$$ Finally, set $a = k\sigma$ to get the Chebyshev inequality.


Alternatively, consider the variance $\sigma^2$as representing the moment of inertia of the probability mass about the center of mass (a.k.a. mean $\mu$). The total probability mass $M$ in the region $(-\infty, \mu-k\sigma] \cup [\mu+k\sigma, \infty)$ that is far far away from the mean $\mu$ contributes a total of at least $M\cdot (k\sigma)^2$ to the sum or integral for $\sigma^2 = E[(X-\mu)^2]$, and so, since everything else in that sum or integral is nonnegative, it must be that $$\sigma^2 \geq M\cdot (k\sigma)^2 \implies M = P\{|X-\mu| \geq k\sigma\} \leq \frac{1}{k^2}.$$

Note that for a given value of $k$, equality will hold in the Chebyshev inequality when there are equal point masses of $\frac{1}{2k^2}$ at $\mu \pm k\sigma$ and a point mass of $1 - \frac{1}{k^2}$ at $\mu$. The central mass contributes nothing to the variance/moment-of-inertia-about-center-of-mass calculation while the far-away masses each contribute $\left(\frac{1}{2k^2}\right)(k\sigma)^2 = \frac{\sigma^2}{2}$ to add up to the variance $\sigma^2$