Solved – Central limit theorem and the Pareto distribution

central limit theoremfat-tailsintuitionpareto-distributionvariance

Can somebody please provide a simple (lay person) explanation of the relationship between Pareto distributions and the Central Limit Theorem (e.g. does it apply? Why/ why not?)?
I am trying to understand the following statement:

"the Central Limit Theorem doesn’t work with every distribution. This is due to one sneaky fact — sample means are clustered around the mean of the underlying distribution if it exists. But how can a distribution have no mean? Well, one common distribution that has no mean is the Pareto distribution. If you tried to calculate it using the usual methods, it would diverge to infinity."

Best Answer

The statement is not true in general -- the Pareto distribution does have a finite mean if its shape parameter ($\alpha$ at the link) is greater than 1.

When both the mean and the variance exist ($\alpha>2$), the usual forms of the central limit theorem - e.g. classical, Lyapunov, Lindeberg will apply

See the description of the classical central limit theorem here

The quote is kind of weird, because the central limit theorem (in any of the mentioned forms) doesn't apply to the sample mean itself, but to a standardized mean (and if we try to apply it to something whose mean and variance are not finite we'd need to very carefully explain what we're actually talking about, since the numerator and denominator involve things which don't have finite limits).

Nevertheless (in spite of not quite being correctly expressed for talking about central limit theorems) it does have something of an underlying point -- if the shape parameter is small enough, the sample mean won't converge to the population mean (the weak law of large numbers doesn't hold, since the integral defining the mean is not finite).


As kjetil rightly points out in comments, if we're to avoid the rate of convergence being terrible (i.e. to be able to use it in practice), we need some kind of bound on "how far way"/"how quickly" the approximation kicks in. It's no use having an adequate approximation for $n> 10^{10^{100}}$ (say) if we want some practical use from a normal approximation.

The central limit theorem is about the destination but tells us nothing about how fast we get there; there are, however, results like the Berry-Esseen theorem theorem which do bound the rate (in a particular sense). In the case of Berry-Esseen, it bounds the largest distance between distribution function of the standardized mean and the standard normal cdf in terms of the third absolute moment, $E(|X|^3)$.

So in the case of the Pareto, if $\alpha\gt 3$, we can at least get some bound on just how bad the approximation might be at some $n$, and how quickly we're getting there. (On the other hand, bounding the difference in cdfs isn't necessarily an especially "practical" thing to bound -- what you're interested may not relate especially well to a bound on the difference in tail area). Nevertheless, it is something (and in at least some situations a cdf bound is more directly useful).

Related Question