Solved – Problems generating a sample from a custom distribution with log

density functionrsamplesampling

I'm trying to generate a sample from a family of distributions. In particular I would like to be able to obtain a sample from the survival function:

$$1-F(x) = c x^{-a} \log^b(x)$$
with a proper domain so that F is defined and it is a survival function, from different values of $c$, $b$ and $a$.

For example:
$$1-F(x) \approx \tfrac{\log(x)}{x^2}, x>\sqrt{e}$$

Roughly speaking I need a function $F$ such that the distribution is heavy tailed and behaves like $1-F(x) = c x^{-a} \log^b(x)$, as $x$ goes to $+ \infty$.

I've tried many ways, because I found several examples:

But anytime, even for the simplest function with $b=1$ and $a=2$ errors come out. I think it's because maybe, due to the log, some algorithms work with all real numbers and cannot be confined to positive numbers.

Which is the easiest way for a pdf/cdf with a log in it? Is it difficult due to the fact that it's hard to invert the function xlogx?

If I have to use some of the previous methods I can show you my achivements and where I got stuck!

EDIT 1
Thanks to whuber I succeded in building the code, here you are:

# Simulating data from G(x) = 1-F(x) = c * x^(-a) * (log(x))^b
# Case: b > 0
pxlog <- function(x, a=5, b=2, c=(a*exp(1)/b)^b) {((1-c*x^(-a)*(log(x))^b))}                 # G
dxlog <- function(x, a=5, b=2, c=(a*exp(1)/b)^b) {c*(x^(-1-a))*((log(x))^(-1+b))*(-b+a*log(x))}
qxlog <- function(y, a=5, b=2, c=(a*exp(1)/b)^b) {exp(-(b/a)*lambert_Wm1(-(a/b)*((1-y)/c)^(1/b)))}   # inversa di G

# Domain for the functions: x > exp(b/a)

# Generating Samples
rxlog <- function(n, a=5, b=2, c=(a*exp(1)/b)^b) qxlog(runif(n),a,b,c)

# Testing Samples
hist(rxlog(10000, 2, 10), breaks=50, freq = F, col="grey", label=F)
curve(dxlog(x, 2, 10), exp(10/2), add= TRUE, col="red")

The result is that it works… almost! If I use values of $b$ greater than $a$, or in general, if $b-a>-1$, fitting density/histogram creates problem.

For example $a=2, b=10$
enter image description here

or with $a=3, b=4$ (adjusting the breaks in hist)

enter image description here

Which is the problem? And second question, how can I include, if it's possible, the domain information about $F$ in the definition of $F$ itself?
Thanks!

EDIT 2 For $b>0$ the distribution I'm looking for can be assumed as the log gamma distribution. You can find it in the Actuar library in R. Testing "my" distribution (the one with the code in EDIT 1) with that one there are some differences… but I think that it's just because I've made some mistakes!

Best Answer

We can generate random variates from this distribution by inverting it.

This means solving $1 - F(x) = q$ (which lies between $0$ and $1$, implying $a\gt 0$) for $x$. Notice that $\log^b(x)$ will be undefined or negative (which won't work) unless $x \gt 1$. Leave aside the case $b=0$ for the moment. The solutions are slightly different depending on the sign of $b$. When $b \lt 0$, $$x=\exp(y)$$ and solve for $y \gt 0$:

$$q^{1/b} = (c \log^b(x) x^{-a})^{1/b} = c^{1/b} y\exp(-\frac{a}{b} y),$$

whence

$$-\frac{b}{a} W\left(-\frac{a}{b} \left(\frac{q}{c}\right)^{1/b}\right)= y.$$

$W$ is the primary branch of the Lambert $W$ function: when $w = u \exp(u)$ for $u \ge -1$, $W(w)=u$. The solid lines in the figure show this branch.

![Figure: Plots of the inverse of W and W itself

When $b\gt 0$, let $$x = \exp(-y)$$ and solve as before, yielding

$$\frac{b}{a} W\left(-\frac{a}{b} \left(\frac{q}{c}\right)^{1/b}\right)= y.$$

Because we are interested in the behavior as $x\to \infty$, which corresponds to $y\to -\infty$, the relevant branch is the one shown with the dashed lines in the figure. Notice that the argument of $W$ must lie in the interval $[-1/e, 0]$. This will happen only when $c$ is sufficiently large. Since $0 \le q \le 1$, this implies

$$c \ge \left(\frac{e a}{b}\right)^b.$$

Values along either branch of $W$ are readily computed using Newton-Raphson. Depending on the values of $a,b,$ and $c$ chosen, between one and a dozen iterations will be needed.

Finally, when $b=0$ the logarithmic term is $1$ and we can readily solve

$$x = \left(\frac{c}{q}\right)^{1/a} = \exp\left(-\frac{1}{a}\log\left(\frac{q}{c}\right)\right).$$

(In some sense the limiting value of $(q/c)^{1/b}/b$ gives the natural logarithm, as we would hope.)

In either case, to generate variates from this distribution, stipulate $a$, $b$, and $c$, then generate uniform variates in the range $[0,1]$ and substitute their values for $q$ in the appropriate formula.


Here are examples with $a=5$ and $b=\pm 2$ to illustrate. $10,000$ independent variates were drawn and summarized with histograms of $y$ and $x$. For negative $b$ (top), I chose a value of $c$ that gives a picture that is not terribly skewed. For positive $b$ (bottom), the most extreme possible value of $c$ was chosen. Shown for comparison are solid curves graphing the derivative of the distribution function $F$. The match is excellent in both cases.

Negative $b$

Figure: Negative b

Positive $b$

Figure: Positive b


Here is working code to compute $W$ in R, with an example showing its use. It is vectorized to perform Newton-Raphson steps in parallel for a large number of values of the argument, which is ideal for efficient generation of random variates.

(Mathematica, which generated the figures here, implements $W$ as ProductLog. The negative branch used here is the one with index $-1$ in Mathematica's numbering. It returns the same values in the examples given here, which are computed to at least 12 significant figures.)

W <- function(q, tol.x2=1e-24, tol.W2=1e-24, iter.max=15, verbose=FALSE) {
  #
  # Define the function and its derivative.
  #
  W.inverse <- function(z) z * exp(z)
  W.inverse.prime <- function(z) exp(z) + W.inverse(z)
  #
  # Functions to take one Newton-Raphson step.
  #
  NR <- function(x, f, f.prime) x - f(x) / f.prime(x)
  step <- function(x, q) NR(x, function(y) W.inverse(y) - q, W.inverse.prime)
  #
  # Pick a good starting value.  Use the principal branch for positive
  # arguments and its continuation (to large negative values) for 
  # negative arguments.
  #
  x.0 <- ifelse(q < 0, log(-q), log(q + 1))
  #
  # True zeros must be handled specially.
  #
  i.zero <- q == 0
  q.names <- q
  q[i.zero] <- NA
  #
  # Newton-Raphson iteration.
  #
  w.1 <- W.inverse(x.0)
  i <- 0
  if (verbose) x <- x.0
  if (any(!i.zero, na.rm=TRUE)) {
    while (i < iter.max) {
      i <- i + 1
      x.1 <- step(x.0, q)
      if (verbose) x <- rbind(x, x.1)
      if (mean((x.0/x.1 - 1)^2, na.rm=TRUE) <= tol.x2) break
      w.1 <- W.inverse(x.1)
      if (mean(((w.1 - q)/(w.1 + q))^2, na.rm=TRUE) <= tol.W2) break
      x.0 <- x.1
    }
  }
  x.0[i.zero] <- 0
  w.1[i.zero] <- 0
  rv <- list(W=x.0,                              # Values of Lambert W
             W.inverse=w.1,                      # W * exp(W)
             Iterations=i,
             Code=ifelse(i < iter.max, 0, -1),   # 0 for success
             Tolerances=c(x2=tol.x2, W2=tol.W2))
  names(rv$W) <- q.names                         # $
  if (verbose) {
    rownames(x) <- 1:nrow(x) - 1
    rv$Steps <- x                                # $
  }
  return (rv)
}
#
# Test on both positive and negative arguments
#
q <- rbind(Positive = 10^seq(-3, 3, length.out=7), 
           Negative = -exp(seq(-(1+1e-15), -600, length.out=7)))
for (i in 1:nrow(q)) {
  rv <- W(q[i, ], verbose=TRUE)
  cat(rv$Iterations, " steps:", rv$W, "\n")
  #print(rv$Steps, digits=13)                     # Shows the steps
  #print(rbind(q[i, ], rv$W.inverse), digits=12)  # Checks correctness
}