Westfall says, “the proportion of the kurtosis that is determined by the central $\mu\pm\sigma$ range is usually quite small” but is the reverse true

kurtosismathematical-statisticsmomentsstandard deviation

In his article that debunks the notion of kurtosis as measuring distribution peakedness, Peter Westfall writes,

[T]he proportion of the kurtosis that is determined by the central $\mu\pm\sigma$ range is usually quite small.

I read this to mean that we learn little about kurtosis by knowing how much of the density is within a standard deviation of the mean.

Is the reverse true? Knowing the kurtosis, can we say anything (beyond the Chebyshev inequality) about how much density is contained within one standard deviation of the mean?

(I think I am happy to work with empirical distributions, in order to avoid issues with undefined means and infinite variance.)

Reference

Westfall, Peter H. "Kurtosis as peakedness, 1905–2014. RIP." The American Statistician 68.3 (2014): 191-195.

Best Answer

Answer edited 9/15/2021:

In his answer to the OP, @whuber claims as follows:

For a distribution with kurtosis $\kappa$, the total density within one SD of the mean lies between $1−1/\kappa$ and $1$, where $\kappa$ is the (non-excess) kurtosis of the distribution.

THIS CLAIM IS FALSE.

The following example shows clearly that @whuber's result is false.

Consider my "Counterexample #1" from here, with $\theta = .001.$ In that counterexample, the kurtosis is $25.5,$ the range $1-1/\kappa$ to $1.0$ is from $0.96$ to $1.0,$ yet the probability within a standard deviation of the mean is $0.5$. These statements are verified by the R code:

th = .001
Z = c(-sqrt(.155/th +1.44), -1.2, -.5, +.5, +1.2, +sqrt(.155/th +1.44))
p = c(th/2, (.5-th)/2, .25, .25, (.5-th)/2, th/2)

sum(p)       # The probabilities sum to one so it is a valid pmf
sum(Z*p)     # The mean is zero
sum(Z^2*p)   # The variance is one

plot(Z, p, type="h", lwd = 4, cex.lab=1.5, cex.axis=1.5, 
        ylab="Probability")
abline(v=c(-1,1), lty=2, lwd=2)  # Shows values within +- 1 sd

k = sum(Z^4*p)
k       # Kurtosis is 25.5

range = c(1 - 1/k,1)
range     # (.96, 1.0) is the range suggested by @whuber's false theorem 
          # about probability within a sd of mu

sum(p[abs(Z)<1])  # 0.5 is the actual probability within +- 1sd

Here is a graph of the counterexample distribution. The dashed vertical lines mark the $\mu \pm \sigma$ limits, within which it is clearly visible that there is only $0.50$ probability.

You can also illustrate the counterexample using a reproducible data set and summary statistics. The following R code generates $1000000$ samples from the counterexample distribution, a large enough sample size so that the "bias corrections" are negligible. The estimated kurtosis is $26.02$, the range $(1 - 1/26.02, 1)$, within which the central probability is supposed to lie, is $(.96,1)$, yet the estimated central probability is $0.4999$.

set.seed(12345)
N = 1000000
Data = sample(Z, N, p, replace = TRUE)
xbar = mean(Data)
s = sd(Data)

library(moments)
ku = kurtosis(Data)
ku
c(1-1/ku, 1)  # @whuber's false claim of central probability range

sum( Data >= xbar -s & Data <= xbar +s )/N  # Actual central probability

It is amusing to see just how spectacularly @whuber's result does fail. In my counterexample #1 family of distributions, the kurtosis can tend to infinity, implying, according to @whuber's "result," that the central probability approaches $1.0$. But instead, the central probability stays constant at $0.5$!

One does not need to construct fancy counterexamples to illustrate such spectacular failure of @whuber's claim. Consider the common $T_\nu$ distribution, the Student T distribution with degrees of freedom parameter $\nu$. For $\nu > 4$, its mean is zero, its variance is $\sigma^2 = \nu/(\nu -2)$, and its (non-excess) kurtosis is $\kappa = 6/(\nu-4) +3$. In the range $4 < \nu \le 5$, the kurtosis ranges from $9$ to $\infty$, while the probability within $\pm \sigma$ can be calculated numerically, in R notation, as

pt(sigma, nu) - pt(-sigma,nu)

The following R code and resulting graph shows the range claimed by @whuber (dashed black lines), along with the actual central probability (solid red line).

nu = seq(4.0001, 4.9999, .0001)
sigma = sqrt(nu/(nu-2))
kurt = 6/(nu-4) + 3
Cent.Prob = pt(sigma, nu) - pt(-sigma, nu)

Upper.Bound = rep(1, length(nu))
Lower.Bound = 1 - 1/kurt
plot(nu, Cent.Prob, ylim = c(.6,1), type="l", col="red", 
   ylab="Central Probability", xlab = "degrees of freedom")
points(nu, Upper.Bound, type="l", lty=2)
points(nu, Lower.Bound, type="l", lty=2)

Again, there is a spectacular failure of @whuber's claim, in that the claim implies the central probability must be essentially $1.0$ (for $\nu \approx 4$), when in fact it is far less (around $0.77$).

Thus, @whuber's claim is false: The central probability need not lie in @whuber's stated range. In fact, as my Counterexample #1 shows, the central probability need not increase at all with larger kurtosis.

Here are two results that shed additional light on the relation of kurtosis to the center.

Theorem 1. Consider a random variable $X$ (includes data via the empirical distribution) that has, wlog, mean = 0, variance = 1, and finite fourth moment. Now, create a new random variable $X'$ by replacing the mass/density of $p_X$ within $0 \pm 1$ arbitrarily, but maintaining $E(X')=0$ and $Var(X')=1.$ Then the difference between the maximum and minimum kurtosis statistics over all such replacements is less than 0.25.

Theorem 2. Consider a random variable $X$ as in Theorem 1. Now, create a new random variable $X'$ by replacing the mass/density of $p_X$ outside of $0 \pm 1$ arbitrarily, but maintaining $E(X')=0$ and $Var(X')=1$ in such replacements. Then the difference between the maximum and minimum kurtosis statistics over all such replacements is unbounded (i.e., infinite).

Thus, the effect of moving mass near the center has at most a very small effect on kurtosis, while the effect of moving mass in the tails has an infinite effect.

While one is trying to prove a theorem that proves that the center somehow is related to kurtosis, it is very helpful to know in advance what counterexamples may exist to such a theorem.

Good counterexamples are given here.

"Counterexample #1" shows a family of distributions in which the kurtosis increases to infinity, while the mass inside $\mu \pm \sigma$ stays a constant 0.5.

"Counterexample #2" shows a family of distributions where the mass within $\mu \pm \sigma$ increases to 1.0, yet the kurtosis decreases to its minimum.

So the often-made assertion that kurtosis measures “concentration of mass in the center” is obviously wrong.

Many people think that higher kurtosis implies “more probability in the tails.” This is not true either: Counterexample #1 shows that you can have higher kurtosis with less tail probability when the tails extend.

Instead, kurtosis precisely measures tail leverage. See

How the kurtosis value can determine the unhealthy event

and

In comparison with a standard gaussian random variable, does a distribution with heavy tails have higher kurtosis? .

Best Answer

Related Solutions

Solved – Adjusting for variance-sample size dependence

More academic-sounding term for high-variance

Related Question