Solved – Problem with Pareto distribution and R

pareto-distributionr

I am trying to test this property of pareto distribution: Let f(x) be a pareto distribution

$$
f(x)=\alpha \frac{x_m^\alpha}{x^{\alpha+1}}
$$

so we have the cdf that is

$$
CDF(x)=\int_{x_m}^{x}\alpha \frac{t_m^\alpha}{t^{\alpha+1}}dt=1-\frac{x_m^\alpha}{x^\alpha}
$$

then the probability that $x>x_0$ is

$$
P(x>x_0)=1-CDF(x)=\frac{x_m^\alpha}{x^\alpha}
$$

and so we have

$$
\frac{P(x>x_0)}{f(x)}=\frac{x}{\alpha}
$$

Now i am trying to test it with R.

 library(PtProcess)
 dd<-rpareto(10000,1.5,0.01)
 cdf<-ecdf(dd)
 df<-density(dd)
 ff<-(1-cdf(df$x))/df$y

If i plot ff

 plot(df$x,ff)

I do not obtain the correct straight line. I guess that this is due at the way density() and ecdf() works. I need this form of the test (an a posteriori evaluation of fd and cdf) in order to perform the same test on a sample of data of unknown orgin. I guess that i need a way to binning the ecdf() function in the same way as hist() is the binning version of density.

So my question is:

Does there exist an equivalent binned function of ecdf() as hist() is the binned function of density()?
or can I simulate ecdf() with hist()?

Best Answer

By using ecdf and density, you're not actually doing the Pareto calculations, but instead using estimates based on a sample that are, by their non-parametric nature, not guaranteed (read: not going to) have the desired property.

Try the following:

x <- seq(0.1,10,by=0.1)
fx <- dpareto(x, 1.5, 0.05)
Fx <- ppareto(x, 1.5, 0.05)
plot((1-Fx)/fx ~ x)

You'll get the nice straight line out: enter image description here

Related Solutions

Solved – How to derive the $\alpha$ for the Pareto rule

The basic result is: $\alpha=\log_4 5=1.160964...$

The calculation comes from the Lorenz curve; specifically you're asking for the $\alpha$ for which $L(0.8)=0.2$.

$L$ is defined as

$$L(F)=\frac{\int_{x_\mathrm{m}}^{x(F)}xf(x)\,dx}{\int_{x_\mathrm{m}}^\infty xf(x)\,dx} =\frac{\int_0^F x(F')\,dF'}{\int_0^1 x(F')\,dF'}$$

where $x(F)$ is the inverse of the cdf. The denominator is the mean of the distribution.

For the Pareto distribution, the Lorenz curve is $L(F) = 1-(1-F)^{1-\frac{1}{\alpha}}$, from which we obtain the equation we need to solve:

$$0.2 = 1-(1-0.8)^{1-\frac{1}{\alpha}}.\,$$

Hence

$$1-\frac{1}{\alpha} = \log(0.8)/\log(0.2)$$

$$\alpha=\frac{1}{1-\log(0.8)/\log(0.2)}=1.160964\ldots$$

This is mentioned on the Wikipedia page for the Pareto principle, and on the page on the Pareto distribution

The details are given here.

Solved – Problems generating a sample from a custom distribution with log

We can generate random variates from this distribution by inverting it.

This means solving $1 - F(x) = q$ (which lies between $0$ and $1$, implying $a\gt 0$) for $x$. Notice that $\log^b(x)$ will be undefined or negative (which won't work) unless $x \gt 1$. Leave aside the case $b=0$ for the moment. The solutions are slightly different depending on the sign of $b$. When $b \lt 0$, $$x=\exp(y)$$ and solve for $y \gt 0$:

$$q^{1/b} = (c \log^b(x) x^{-a})^{1/b} = c^{1/b} y\exp(-\frac{a}{b} y),$$

whence

$$-\frac{b}{a} W\left(-\frac{a}{b} \left(\frac{q}{c}\right)^{1/b}\right)= y.$$

$W$ is the primary branch of the Lambert $W$ function: when $w = u \exp(u)$ for $u \ge -1$, $W(w)=u$. The solid lines in the figure show this branch.

When $b\gt 0$, let $$x = \exp(-y)$$ and solve as before, yielding

$$\frac{b}{a} W\left(-\frac{a}{b} \left(\frac{q}{c}\right)^{1/b}\right)= y.$$

Because we are interested in the behavior as $x\to \infty$, which corresponds to $y\to -\infty$, the relevant branch is the one shown with the dashed lines in the figure. Notice that the argument of $W$ must lie in the interval $[-1/e, 0]$. This will happen only when $c$ is sufficiently large. Since $0 \le q \le 1$, this implies

$$c \ge \left(\frac{e a}{b}\right)^b.$$

Values along either branch of $W$ are readily computed using Newton-Raphson. Depending on the values of $a,b,$ and $c$ chosen, between one and a dozen iterations will be needed.

Finally, when $b=0$ the logarithmic term is $1$ and we can readily solve

$$x = \left(\frac{c}{q}\right)^{1/a} = \exp\left(-\frac{1}{a}\log\left(\frac{q}{c}\right)\right).$$

(In some sense the limiting value of $(q/c)^{1/b}/b$ gives the natural logarithm, as we would hope.)

In either case, to generate variates from this distribution, stipulate $a$, $b$, and $c$, then generate uniform variates in the range $[0,1]$ and substitute their values for $q$ in the appropriate formula.

Here are examples with $a=5$ and $b=\pm 2$ to illustrate. $10,000$ independent variates were drawn and summarized with histograms of $y$ and $x$. For negative $b$ (top), I chose a value of $c$ that gives a picture that is not terribly skewed. For positive $b$ (bottom), the most extreme possible value of $c$ was chosen. Shown for comparison are solid curves graphing the derivative of the distribution function $F$. The match is excellent in both cases.

Negative $b$

Positive $b$

Here is working code to compute $W$ in R, with an example showing its use. It is vectorized to perform Newton-Raphson steps in parallel for a large number of values of the argument, which is ideal for efficient generation of random variates.

(Mathematica, which generated the figures here, implements $W$ as ProductLog. The negative branch used here is the one with index $-1$ in Mathematica's numbering. It returns the same values in the examples given here, which are computed to at least 12 significant figures.)

W <- function(q, tol.x2=1e-24, tol.W2=1e-24, iter.max=15, verbose=FALSE) {
  #
  # Define the function and its derivative.
  #
  W.inverse <- function(z) z * exp(z)
  W.inverse.prime <- function(z) exp(z) + W.inverse(z)
  #
  # Functions to take one Newton-Raphson step.
  #
  NR <- function(x, f, f.prime) x - f(x) / f.prime(x)
  step <- function(x, q) NR(x, function(y) W.inverse(y) - q, W.inverse.prime)
  #
  # Pick a good starting value.  Use the principal branch for positive
  # arguments and its continuation (to large negative values) for 
  # negative arguments.
  #
  x.0 <- ifelse(q < 0, log(-q), log(q + 1))
  #
  # True zeros must be handled specially.
  #
  i.zero <- q == 0
  q.names <- q
  q[i.zero] <- NA
  #
  # Newton-Raphson iteration.
  #
  w.1 <- W.inverse(x.0)
  i <- 0
  if (verbose) x <- x.0
  if (any(!i.zero, na.rm=TRUE)) {
    while (i < iter.max) {
      i <- i + 1
      x.1 <- step(x.0, q)
      if (verbose) x <- rbind(x, x.1)
      if (mean((x.0/x.1 - 1)^2, na.rm=TRUE) <= tol.x2) break
      w.1 <- W.inverse(x.1)
      if (mean(((w.1 - q)/(w.1 + q))^2, na.rm=TRUE) <= tol.W2) break
      x.0 <- x.1
    }
  }
  x.0[i.zero] <- 0
  w.1[i.zero] <- 0
  rv <- list(W=x.0,                              # Values of Lambert W
             W.inverse=w.1,                      # W * exp(W)
             Iterations=i,
             Code=ifelse(i < iter.max, 0, -1),   # 0 for success
             Tolerances=c(x2=tol.x2, W2=tol.W2))
  names(rv$W) <- q.names                         # $
  if (verbose) {
    rownames(x) <- 1:nrow(x) - 1
    rv$Steps <- x                                # $
  }
  return (rv)
}
#
# Test on both positive and negative arguments
#
q <- rbind(Positive = 10^seq(-3, 3, length.out=7), 
           Negative = -exp(seq(-(1+1e-15), -600, length.out=7)))
for (i in 1:nrow(q)) {
  rv <- W(q[i, ], verbose=TRUE)
  cat(rv$Iterations, " steps:", rv$W, "\n")
  #print(rv$Steps, digits=13)                     # Shows the steps
  #print(rbind(q[i, ], rv$W.inverse), digits=12)  # Checks correctness
}

Best Answer

Related Solutions

Solved – How to derive the $\alpha$ for the Pareto rule

Solved – Problems generating a sample from a custom distribution with log

Related Question