Probability – Understanding Pareto Distribution

probabilityprobability distributionsprobability theory

According to Wolfram, the Pareto Distribution is given by the probability density function $\frac{ab^a}{x^{a + 1}}$ and cumulative distribution function $1 – (\frac{b}{x})^a$. I am trying to understand the meaning of the parameters. One example on Youtube from a psychology course measures peoples' balance over time when they start with $\$10$ and repeatedly bet $\$1$ on each in a long series of coin flips. First the distribution turns Gaussian, then it becomes Pareto as people start to go bankrupt, thus remaining at $\$0$ forever.

However, it is not obvious to me that other canonical Pareto examples such as societal wealth distribution have this "lowest outcome sticks forever" feature, analogous to bankruptcy in the above experiment. Does $b$ in the above formulas represent a "point of no return," a minimum $x$ value at which the possibility for further mobility along the $x$-axis ends, ultimately causing the distribution to end up Pareto instead of Gaussian, or is this feature incidental to the particular experiment described above?

Best Answer

The Pareto distribution is a heavy-tailed distribution commonly used to describe both human and natural phenomena. Initially developed to describe the distribution of incomes and other financial variables, it is typically used to model scenarios where large elements are rare and small ones are common. As some examples, these include city populations and sizes, moon craters, earthquake magnitudes, hard disk drive error rates, length of words in different languages, frequency of occurrence of personal names, number of citations received by papers, hits on web pages, and so on. Most of the fame of this distribution results from the one of its application called Pareto's principle (also known as the 80/20 rule), which states that, for many events, roughly 80% of the effects come from 20% of the causes. This principle - an observation, rather than a scientific law - has been the subject of much criticism. The interesting question of the OP describes a typical application of this principle: when testing peoples' balance over time by a coin flip design with betting possibilities, the distribution is initially Gaussian, but soon changes and becomes Paretian. To clarify the question of the OP regarding the meaning of this, some details on the mathematical properties of the distribution are needed.

The original assumption was that the probability that a subject's income is greater than $x$ is given by the following power's rule (often called "tail function"):

$$P(X> x)=\left(\frac mx\right)^a$$

Here $m>0$ is the minimal possible income, $a>0$, and $x\geq m$. Note that $m$ corresponds to $b$ in the formula given in the OP. From this, we have that the CDF is

$$ {\displaystyle F(x)={\begin{cases}1-\left({\frac {m}{x}}\right)^{a }&x\geq m\\0&x<m \end{cases}}} $$

By taking the derivative, we obtain that the PDF is

$$\displaystyle f(x)= \begin{cases} \frac{a m^a}{x^{a+1}} & x \geq m \\ 0 & x <m \end{cases} $$

Often, $m$ is called the "scale" parameter and $a$ is called the "shape" parameter (or tail index), as they affect these features of the function, respectively. The curve is typically J-shaped with a right tail, and decreases faster as $a $ increases, approaching the orthogonal axes asymptotically. When the function is plotted on a log-log graph, it reduces to a straight line with negative slope. The function described above, which is the most commonly used Pareto distribution, is often called Pareto type-1, as it is part of a large family of distributions with a definite hierarchical order. Although Pareto distributions are continuous, some discrete versions of the type-1 have a particular importance and are related to other distributions such as the Zipf and the zeta distributions.

The change from Gaussian to Pareto distribution in the experiment reported in the OP does not represent the effect of some critical value that is achieved. Rather, it results from a smooth, progressive modification of the distribution, as clearly illustrated in this brief simulation. It should also be pointed out that such gradual transformation is not ruled by rigid statistical laws, but rather represents - as already reminded - the result of empirical observations. The value of the scale parameter $m$ ($b$ in the OP) does not represent any "point of no return". It is simply a hypothetical minimum $x$ value that sometimes is exactly defined (for example, in the betting experiment cited above, we have $m=0$ corresponding to bankrupt) and that in other cases have to be determined (the search for an adequate minimum value in various scenarios may be quite difficult and can be performed by various approaches: this is a potential weakness for the Pareto distribution). As such, the $m$ value does not preclude the possibility for some mobility along the $x$-axis over the whole range $[m,\infty]$, but simply states that most of the elements considered in the distribution - incomes, city sizes, craters, earthquakes or any other thing describable by a Pareto model - tend to move leftward.

Related Solutions

[Math] Understanding the Pareto distribution as applied to wealth

What you're looking for is called Zipf's law. This law says that many distribution curves in which the data values are placed in rank order on the horizontal axis by frequency (or, equivalently, percent) follow a power law. The most famous use of Zipf's law is to describe the frequency of word usage in any given language, although the Wikipedia article specifically mentions income rankings as you ask for.

It can be thought of as a discrete version of the Pareto distribution, so you're right about that. Added: This is because Zipf's law is the discrete power law distribution, and Pareto is the continuous power law distribution.

(Update, in response to the OP's request for more on the relationship between Zipf and Pareto.)

I'm going to do this in the general case. The argument will also be for numbers and amounts, rather than probabilities, with the understanding that we can convert the functions involved to pdfs or pmfs by scaling by the appropriate constants.

Suppose we have the density function $p(x)$ for dollars (although it could be any good) allocated among people in a group, so that $\int_a^b p(x) dx$ gives the number of people in the group who have between $a$ and $b$ dollars. Now, rank the people in the group by wealth, and let $z(y)$ denote the wealth that the person ranked $y$ has. The question then is, "What is the relationship between $p(x)$ and $z(y)$?"

Consider the number of people who have more than $M$ dollars. Using $p(x)$, that is given by $\int_M^{\infty} p(x) dx$. But this is also $R$, where $R$ is the largest rank of a person who has at least $M$ dollars. (In other words, if the 34th person has at least $M$ but the 35th does not, then $R = 34$.) So $z(R+1) < M \leq z(R)$. If the population is large enough, we can say $z(R) = M$ without losing much accuracy. Thus $R = z^{-1}(M)$. So there's our relationship (approximately): $$\int_M^{\infty} p(x) dx = z^{-1}(M).$$ Thus the ranking function $z(y)$ is the inverse of the wealth tail cumulative distribution function $\int_M^{\infty} p(x) dx$.

How does this relate to power laws? Well, in this special case, if $p(x) = \frac{C}{x^{\alpha+1}}$ (i.e., a Pareto distribution) for some $\alpha > 0$ and constant $C$, then we have $$z^{-1}(M) = \int_M^{\infty} \frac{C}{x^{\alpha}} dx = \frac{C}{\alpha M^{\alpha}},$$ which means, for some constant $K$, $$z(y) = \frac{K}{y^{\frac{1}{\alpha}}}.$$ Thus $z(y)$ is also a power law. Thus a Pareto (power law) distribution function for some good produces a power law ranking function for people with that good (i.e., Zipf).

[Math] Discrete Pareto Distribution

A discrete Pareto distribution is normally known as the Riemann Zeta distribution which is ill-suited to your problem because it is unbounded above … whereas you seek a finite upper bound of $n=10000$.

Your other suggestion of a Zipf distribution seems more appropriate as it has a finite upper bound at $n$ elements, so let us use that. I am using the mathStatica/Mathematica combo here …

So, let $X \sim \text{Zipf}(n,a)$ with pmf $f(x)$:

       f  =  1/(HarmonicNumber[n, a] x^a);

domain[f] = {x, 1, n} && {n>0, a>0} && {Discrete};

The cdf, $P(X \leq x)$, is:

Prob[f]

You wish to find the value of parameter 'a' such that $P(X \leq 2000) = 0.8$, given $n=10000$. Solving this numerically yields:

So that's it … a $\text{Zipf}(n,a)$ pmf with $n = 10000$ and $a = 0.949588$, which will yield you:

$P(X \leq 10000) = 1$
$P(X \leq 2000) = 0.8$

Estoup
Finally, depending on how important it is that $P(X \leq 2000) = 0.8$, you might want to consider the simplified case of $a = 1$ which is known as an Estoup distribution with pmf:

 f  =  1/(HarmonicNumber[n] x)

and which will yield:

$P(X \leq 10000) = 1$
$P(X \leq 2000) = 0.835584$

I'm struggling to plot that cdf in wolfram mathematica... do you know the syntax for that?

Sure. Here is the cdf in InputForm ... just copy and paste:

F = Piecewise[{{1, x >= n}, {(-HurwitzZeta[a, 1 + Floor[x]] + Zeta[a])/HarmonicNumber[n, a], 1 <= x < n}}, 0]

If $n$ is small, you can see that this has a stepped structure:

Plot[F /. {n -> 14, a -> 2}, {x, 0, 14}, PlotRange -> All]

And, for your case, with very large n, it appears continuous, but is of course actually discrete, if you were to zoom in:

Plot[F /. {n -> 10000, a -> 0.949588}, {x, -10, 11000}, PlotRange -> All]

Best Answer

Related Solutions

[Math] Understanding the Pareto distribution as applied to wealth

[Math] Discrete Pareto Distribution

Related Question