[Math] Chebyshev’s Inequality: given probability, find $k$

probabilitystatistics

Edit with Context: Book says the % of data captured within k standard deviations $= 1 – \dfrac{1}{k^2}$. Dug a bit deeper and found it was derived using Chebyshev's but no direct derivation found$\ldots$

Within how many multiples of standard deviation will capture at least $\boldsymbol{75}$% of the data in a distribution with a mean $\boldsymbol\mu$?

I derived the formula below and got that $k$ must be equal to or less than $k$2. This doesn't make sense to me as a larger $k$ would capture more and more data, so it should be the other way around.

Work is shown below.

The inequality derivation:

Let $v = |X-\mu|$.
Let $y = k\sigma$. Then
$$P(v \geq y) \leq \frac{1}{k^2} = 1 – P(v < y) \leq \frac{1}{k^2}.$$

So
$$k \leq \sqrt\frac{1}{1-P(v<y)}.$$

Best Answer

Your intuition is correct about what Chebyshev inequality says. It's just a minor confusion in the algebra.

The RHS of the inequality on your last line, the $y$ contains $k$ as well. You cannot directly interpret/solve in that fashion (see the edit below for more details).

If you take the probability $P(v<y)$ as given, that is $P(v < k \sigma) = p$ with everything (e.g. the density and $\sigma$) known, then in principle one can directly solve for the value of $k$.

For example, if $X$ is normally distributed and given $P(v < k \sigma) = p = 0.800$, the equation is equivalent to $\Phi(k) = \frac{1+p}2 = 0.900 $ and the solution to is $k \approx 1.28155$.

However, this is often not easy and we bound the probability of interest $P(v < y)$ (which is an increasing function of $k$) by the $1 - \frac1{k^2}$ on the RHS (which is also an increasing function in $k$).

Thus, the inequality to solve as per the question statement to capture at least 75% of the data becomes

$$ 1 - \frac1{k^2} \geq \frac{3}4 \qquad \textbf{so as to guarantee} \qquad Pr\bigg\{~ |X- \mu| < k\, \sigma ~\bigg\} \geq 1 - \frac1{k^2} \geq \frac{3}4 $$

and this gives the desired correct direction of the inequality for $k$.

----------- Below is esp. in response to the comment----------

The Chebychev inequality if written this way: $$ Pr\bigg\{~ |X- \mu| < k\, \sigma ~\bigg\} \geq 1 -\frac{1}{k^2} \tag*{Eq.(1)}$$

then from the original question statement to capture at least 75% of the data, the correct inequality to solve is

$$ Pr\bigg\{~ |X- \mu| < k\, \sigma ~\bigg\} \geq \frac{3}4 \qquad \textbf{but NOT} \qquad \frac{3}4 \geq 1 -\frac{1}{k^2} \quad \text{(which gives $k \leq 2$)}$$

Similar goes with the complement statement not capturing at most 25% of the data applied directly to $P( v > y ) \leq 1/k^2$.

In conclusion, the counter-intuitive $k \leq \sqrt{1/(1-p)}$ stems from the deceivingly inviting direct application of the inequality. I hope this answers your question of "how this happened".