Solved – Accurately generating variates from discrete power law distribution

discrete datapower lawrandom-generation

What are the best methods to accurately generate random integers distributed according to a power law? The probability of getting $k$ ($k=1,2,\ldots$) should be equal to $p_k = k^{-\gamma} / \zeta(\gamma)$ and the method should work well for any $\gamma > 1$.

I can see two naive approaches:

  1. Compute $p_k$ up to some large $k_\text{max}$ so that $\sum_{k=1}^{k_\text{max}}$ is "close enough" to 1, then generate integers according to these probabilities. This just won't work if $\gamma$ is close to 1 as $k_\text{max}$ would need to be huge.

  2. Draw real numbers from a continuous power law distribution (an easier problem that I know how to solve) and round them to integers in some way. It is possible to analytically compute the precise probability of obtaining each integer with the above method. I could use rejection to correct these to $p_k$ (which can also be computed provided I can evaluate the $\zeta$ function). (This would be a bit hairy as I'd have to round in a way that I obtain integers with higher probability than $p_k$ for $k$ greater than some small value, and handle $k$ less than that separately.)

Is there a better method that is also accurate (not approximate)?

Best Answer

I think (a slightly modified version of) method 2 is quite straightforward, actually

Using the definition of the Pareto distribution function given in Wikipedia

$$F_X(x) = \begin{cases}1-\left(\frac{x_\mathrm{m}}{x}\right)^\alpha & x \ge x_\mathrm{m}, \\0 & x < x_\mathrm{m},\end{cases}$$

if you take $x_m=\frac{1}{2}$ and $\alpha=\gamma$ then the ratio of $p_x$ to $q_x=F_X(x+\frac{1}{2})-F_X(x-\frac{1}{2})$ is maximized at $x=1$, meaning you can just scale by the ratio at $x=1$ and use straight rejection sampling. It seems to be reasonably efficient.

To be more explicit: if you generate from a Pareto with $x_m=\frac{1}{2}$ and $\alpha=\gamma$ and round to the nearest integer (rather than truncate), then it seems to be possible to use rejection sampling with $M = p_1/q_1$ -- each generated value of $x$ from that process is accepted with probability $\frac{p_x}{Mq_x}$.

enter image description here

($M$ here was slightly rounded up since I'm lazy; in reality the fit for this case would be a tiny bit different, but not enough to look different in the plot - in fact the small image makes it look a tad too small when it's actually a fraction too large)

More careful tuning of $x_m$ and $\alpha$ ($\alpha=\gamma-a$ for some $a$ between 0 and 1 say) would probably boost efficiency further, but this approach does reasonably well in the cases I've played with.

If you can give some sense of the typical range of values of $\gamma$ I can take a closer look at efficiency there.


Method 1 can be adapted to be exact, as well, by performing method 1 almost always, then applying another method to deal with the tail. This can be done is ways that may be very fast.

For example, if you take an integer vector of length 256, and fill the first $\lfloor 256 p_1\rfloor$, values with 1, the next $\lfloor 256 p_2\rfloor$ values with 2 and so on until $256 p_i <1$ -- that will almost use up the whole array. The remaining few cells then indicate to move to a second method which combines dealing with the right tail and also the tiny 'left-over' bits of probability from the left part.

The left remnant might then be done by a number of approaches (even with, say 'squaring the histogram' if it is automated, but it doesn't have to be as efficient as that), and the right tail can then be done using something like the above accept-reject approach.

The basic algorithm involves generating an integer from 1 to 256 (which requires only 8 bits from the rng; if efficiency is paramount, bit-operations can take those 'off the top', leaving the remainder of the uniform number (it would best be left as an unnormalized integer value to this point) able to be used to deal with the left remnant and right tail if required.

Carefully implemented, this kind of thing can be very fast. You can use different values of $2^k$ than 256 (e.g. $2^{16}$ might be a possibility), but everything is notionally the same. If you take a very large table, however, there may not be enough bits left in the uniform for it to be suitable for generating the tail and you need a second uniform value there (but it becomes very rarely needed, so it's not much of an issue)

In the same zeta(2) example as above, you'd have 212 1's, 26 2's, 7 3's, 3 4's, one 5 and the values from 250-256 would deal with the remnant. Over 97% of the time you generate one of the values in the table (1-5).

Related Question