Sample from a custom heavy tailed (e.g. custom Cauchy) distribution

samplingsimulation

I would like to draw samples from a distribution with pdf $z \mapsto \frac{1}{1 + |z|^\gamma}$ ($z \in \mathbb{R}, \gamma > 1$). Notice, that for $\gamma = 2$, this is proportional to the standard Cauchy distribution's pdf. To draw samples from that distribution, I followed this approach: https://towardsdatascience.com/random-sampling-using-scipy-and-numpy-part-i-f3ce8c78812e.
This is what I got so far:

def pdf(z: np.ndarray, gamma: float) -> np.ndarray:
    return 1 / (1 + np.power(np.abs(z), gamma))

def cdf(z: np.ndarray, gamma: float) -> np.ndarray:
    densities = pdf(z, gamma)
    cumulative_densities = np.cumsum(densities)
    cumulative_densities = cumulative_densities / cumulative_densities[-1]
    return cumulative_densities

def inverse_cdf(z: np.ndarray, gamma: float) -> Callable[[np.ndarray], float]:
    cdfs = cdf(z, gamma)
    inv_cdf = interp1d(cdfs, z, fill_value='extrapolate')
    return inv_cdf

I would then use this code as follows:

z = np.linspace(-5, 5, 100001)
inv_cdf = inverse_cdf(z, gamma=2.0)
uniform_samples = np.random.default_rng().uniform(size = 100001)
custom_samples = inv_cdf(uniform_samples)

From what I understand, my choice on z (especially regarding the bounds) will directly affect what I would be able to sample from my custom distribution. So in this case, I would not be able to sample numbers smaller than -5 or larger than 5. But if I draw that many samples from np.random.Genarator.standard_cauchy, I will likely experience values of -142605 or 127183.

How do I tackle this problem smartly? Plainly increasing the bounds of z seems to be awkward – I imagine that picking any finite value would be too restrictive. Additionally, maintaining a reasonable sample rate (i.e. differences between the values in z) would be too demanding regarding computational resources.

Best Answer

Because $Z$ is symmetric around $0$ and $|Z|$ has a generalized beta prime distribution, there is a simple algorithm to obtain random values of $Z$ efficiently:

  • Step 1: Draw a random value $Y$ from a Beta$\left(1 - \frac{1}{\gamma}, \frac{1}{\gamma}\right)$ distribution.

  • Step 2: Set $Z = \left(\frac{1}{Y}-1\right)^{1/\gamma}.$

  • Step 3: With probability $1/2,$ negate $Z.$

This is intended for smallish $\gamma,$ which are the ones with appreciable chances of yielding huge values of $|Z|.$ With larger values of $\gamma$ (perhaps $\gamma \gt 10$) other procedures become more attractive, such as rejection sampling from a Pareto or Student t distribution.


To illustrate, here is a full implementation in R that generates a specified number of such random variates by means of the rbeta function to obtain realizations of $Y:$

n <- 1e7
g <- 1.1
y <- rbeta(n, 1 - 1/g, 1/g)
z <- (1/y - 1)^(1/g) * sample(c(-1,1), n, replace=TRUE)

(This takes about two seconds to generate ten million values.)

Because double precision floats cannot exceed $10^{308}$ (or so), this can occasionally generate an infinite value of z for small $\gamma \approx 1.$ Regardless, the range often is huge. To study the output, then, I plotted a histogram of the logarithms of the absolute values of z. This one covers the full range of generated absolute values (from $0$ to $5\times 10^{77}$ in this instance). Figure

Over this, in red, is a graph of the density of $\log|Z|$ as given by

$$f_{\log|Z|}(x;\gamma) = \frac{\gamma}{B\left(\frac{1}{\gamma}, 1 - \frac{1}{\gamma}\right)} \frac{\exp(x)\,\mathrm{d}x}{1 + \exp(x)^\gamma}.$$

This is obviously derived from a density proportional to $1/(1 + |z|^\gamma).$ You can see the agreement with the simulated values is excellent.