Solved – Connection between power law and Zipf’s law

distributionsfittingpower lawzipf

I am trying to better understand the connection between the power law distribution and Zipf's distribution (law). There is a neat explanation in [1].

The article suggests that as we can derivate the power law function from Pareto's law, combined with the relationship between Pareto's law and Zipf's law, the power law parameter alpha is 1 + 1/b. From my understand, this would mean that we can directly determine Zipf's law's b parameter by simply having the power law alpha parameter. So e.g., an alpha of 2 would lead to a b of 1.

Is this true? So can I calculate alpha by e.g., using the methods by Clauset et al. [2] on my data and then directly determine the Zipf parameter b by the definition? This would allow me to use the exact methods by Clauset instead of the non-exact methods like fitting a straight line on a log-log plot. So I would also overcome the necessity of producing rankings etc.

[1] http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html

[2] http://arxiv.org/abs/0706.1062

Best Answer

Zipf's law is generally understood to simply be a power-law distribution with integer values, that is, a probability distribution with the form

$p(x) \propto x^{-\alpha}$ for $x\geq x_{\min}>0$, $\alpha>1$ and $x\in \mathbb{N}_{>0}$

where $x_{\min}$ is the smallest value for which the power law holds, and is generally 1 for Zipf's Law (although not always; there is some ambiguity in the literature as to whether the term Zipf's Law is reserved for the $x_{\min}=1$ case or whether it can be used for $x_{\min}>1$).

But, power-law distributions have the special property that the complementary cumulative distribution function (ccdf) is also a power law form, $P(x) \propto x^{-\beta}$ but now where $\beta>0$ (and $\beta=\alpha-1$). This presents some ambiguity in interpreting what exactly people mean when they state that the estimated such-and-such a parameter for Zipf's Law. Do they mean $\alpha$ or $\beta$? It's important to be clear about which one you are stating. So long as you say whether the parameter you estimate is the pdf or cdf parameter, you should be fine.

Another small point: when people talk about Pareto distributions and data, they often talk about "rank-frequency" plots. These are the same thing as the ccdf (a point we discuss a little more in our SIAM Review paper that you link to), just with the axes reversed. That means you can easily transform an exponent someone has estimated from a rank-frequency plot (what Lada Adamic calls the Pareto form) to a regular pdf exponent by taking the reciprocal. But, people don't really distinguish between Zipf and Pareto laws like that. Both are just power-law distributions, so it's better to just talk about $\alpha$.