Probability – Understanding Pareto Distribution

probabilityprobability distributionsprobability theory

According to Wolfram, the Pareto Distribution is given by the probability density function $\frac{ab^a}{x^{a + 1}}$ and cumulative distribution function $1 – (\frac{b}{x})^a$. I am trying to understand the meaning of the parameters. One example on Youtube from a psychology course measures peoples' balance over time when they start with $\$10$ and repeatedly bet $\$1$ on each in a long series of coin flips. First the distribution turns Gaussian, then it becomes Pareto as people start to go bankrupt, thus remaining at $\$0$ forever.

However, it is not obvious to me that other canonical Pareto examples such as societal wealth distribution have this "lowest outcome sticks forever" feature, analogous to bankruptcy in the above experiment. Does $b$ in the above formulas represent a "point of no return," a minimum $x$ value at which the possibility for further mobility along the $x$-axis ends, ultimately causing the distribution to end up Pareto instead of Gaussian, or is this feature incidental to the particular experiment described above?

Best Answer

The Pareto distribution is a heavy-tailed distribution commonly used to describe both human and natural phenomena. Initially developed to describe the distribution of incomes and other financial variables, it is typically used to model scenarios where large elements are rare and small ones are common. As some examples, these include city populations and sizes, moon craters, earthquake magnitudes, hard disk drive error rates, length of words in different languages, frequency of occurrence of personal names, number of citations received by papers, hits on web pages, and so on. Most of the fame of this distribution results from the one of its application called Pareto's principle (also known as the 80/20 rule), which states that, for many events, roughly 80% of the effects come from 20% of the causes. This principle - an observation, rather than a scientific law - has been the subject of much criticism. The interesting question of the OP describes a typical application of this principle: when testing peoples' balance over time by a coin flip design with betting possibilities, the distribution is initially Gaussian, but soon changes and becomes Paretian. To clarify the question of the OP regarding the meaning of this, some details on the mathematical properties of the distribution are needed.


The original assumption was that the probability that a subject's income is greater than $x$ is given by the following power's rule (often called "tail function"):

$$P(X> x)=\left(\frac mx\right)^a$$

Here $m>0$ is the minimal possible income, $a>0$, and $x\geq m$. Note that $m$ corresponds to $b$ in the formula given in the OP. From this, we have that the CDF is

$$ {\displaystyle F(x)={\begin{cases}1-\left({\frac {m}{x}}\right)^{a }&x\geq m\\0&x<m \end{cases}}} $$

By taking the derivative, we obtain that the PDF is

$$\displaystyle f(x)= \begin{cases} \frac{a m^a}{x^{a+1}} & x \geq m \\ 0 & x <m \end{cases} $$

Often, $m$ is called the "scale" parameter and $a$ is called the "shape" parameter (or tail index), as they affect these features of the function, respectively. The curve is typically J-shaped with a right tail, and decreases faster as $a $ increases, approaching the orthogonal axes asymptotically. When the function is plotted on a log-log graph, it reduces to a straight line with negative slope. The function described above, which is the most commonly used Pareto distribution, is often called Pareto type-1, as it is part of a large family of distributions with a definite hierarchical order. Although Pareto distributions are continuous, some discrete versions of the type-1 have a particular importance and are related to other distributions such as the Zipf and the zeta distributions.

The change from Gaussian to Pareto distribution in the experiment reported in the OP does not represent the effect of some critical value that is achieved. Rather, it results from a smooth, progressive modification of the distribution, as clearly illustrated in this brief simulation. It should also be pointed out that such gradual transformation is not ruled by rigid statistical laws, but rather represents - as already reminded - the result of empirical observations. The value of the scale parameter $m$ ($b$ in the OP) does not represent any "point of no return". It is simply a hypothetical minimum $x$ value that sometimes is exactly defined (for example, in the betting experiment cited above, we have $m=0$ corresponding to bankrupt) and that in other cases have to be determined (the search for an adequate minimum value in various scenarios may be quite difficult and can be performed by various approaches: this is a potential weakness for the Pareto distribution). As such, the $m$ value does not preclude the possibility for some mobility along the $x$-axis over the whole range $[m,\infty]$, but simply states that most of the elements considered in the distribution - incomes, city sizes, craters, earthquakes or any other thing describable by a Pareto model - tend to move leftward.

Related Question