[Math] Understanding the Pareto distribution as applied to wealth

mathematical modelingprobabilityprobability distributionsstatistics

The Pareto distribution is used to say, given a particular person X, what is the pdf of his wealth.

I would like to explore the reciprocal question: Given the total amount of wealth in a population, what portion does a random person have. I conjecture that this is simply a constant times the Pareto distribution.

More interestingly: What is the shape of the distribution curve, if the richest person would be at the 0 point on the x axis, the next richest person to the right, and so on – we would see a monotonically decreasing curve. But what is its shape? What is its derivative?

It's quite likely that I'm not phrasing that question properly. Let me ask a more basic question: What is the appropriate terminilogy to explore the question? Give a probability distribution applied many times over, what is the shape of the resultant allocation curve?

Best Answer

What you're looking for is called Zipf's law. This law says that many distribution curves in which the data values are placed in rank order on the horizontal axis by frequency (or, equivalently, percent) follow a power law. The most famous use of Zipf's law is to describe the frequency of word usage in any given language, although the Wikipedia article specifically mentions income rankings as you ask for.

It can be thought of as a discrete version of the Pareto distribution, so you're right about that. Added: This is because Zipf's law is the discrete power law distribution, and Pareto is the continuous power law distribution.


(Update, in response to the OP's request for more on the relationship between Zipf and Pareto.)

I'm going to do this in the general case. The argument will also be for numbers and amounts, rather than probabilities, with the understanding that we can convert the functions involved to pdfs or pmfs by scaling by the appropriate constants.

Suppose we have the density function $p(x)$ for dollars (although it could be any good) allocated among people in a group, so that $\int_a^b p(x) dx$ gives the number of people in the group who have between $a$ and $b$ dollars. Now, rank the people in the group by wealth, and let $z(y)$ denote the wealth that the person ranked $y$ has. The question then is, "What is the relationship between $p(x)$ and $z(y)$?"

Consider the number of people who have more than $M$ dollars. Using $p(x)$, that is given by $\int_M^{\infty} p(x) dx$. But this is also $R$, where $R$ is the largest rank of a person who has at least $M$ dollars. (In other words, if the 34th person has at least $M$ but the 35th does not, then $R = 34$.) So $z(R+1) < M \leq z(R)$. If the population is large enough, we can say $z(R) = M$ without losing much accuracy. Thus $R = z^{-1}(M)$. So there's our relationship (approximately): $$\int_M^{\infty} p(x) dx = z^{-1}(M).$$ Thus the ranking function $z(y)$ is the inverse of the wealth tail cumulative distribution function $\int_M^{\infty} p(x) dx$.

How does this relate to power laws? Well, in this special case, if $p(x) = \frac{C}{x^{\alpha+1}}$ (i.e., a Pareto distribution) for some $\alpha > 0$ and constant $C$, then we have $$z^{-1}(M) = \int_M^{\infty} \frac{C}{x^{\alpha}} dx = \frac{C}{\alpha M^{\alpha}},$$ which means, for some constant $K$, $$z(y) = \frac{K}{y^{\frac{1}{\alpha}}}.$$ Thus $z(y)$ is also a power law. Thus a Pareto (power law) distribution function for some good produces a power law ranking function for people with that good (i.e., Zipf).

Related Question