[Math] Discrete Pareto Distribution

probability distributionsstatistics

What would be the equation to describe a set of 10,000 bugs if you already know they have a pareto distribution.

In other words 2,000 of the bugs would equal 80% of all your problems.

I'm struggling to make the pareto and/or Zipf equation fit this model… It should be relatively simple.

I would like to make a simple "progress bar" type graph that will use this equation.

Best Answer

A discrete Pareto distribution is normally known as the Riemann Zeta distribution which is ill-suited to your problem because it is unbounded above … whereas you seek a finite upper bound of $n=10000$.

Your other suggestion of a Zipf distribution seems more appropriate as it has a finite upper bound at $n$ elements, so let us use that. I am using the mathStatica/Mathematica combo here …

So, let $X \sim \text{Zipf}(n,a)$ with pmf $f(x)$:

       f  =  1/(HarmonicNumber[n, a] x^a);

domain[f] = {x, 1, n} && {n>0, a>0} && {Discrete};

The cdf, $P(X \leq x)$, is:

Prob[f]

enter image description here

You wish to find the value of parameter 'a' such that $P(X \leq 2000) = 0.8$, given $n=10000$. Solving this numerically yields:

enter image description here

So that's it … a $\text{Zipf}(n,a)$ pmf with $n = 10000$ and $a = 0.949588$, which will yield you:

  • $P(X \leq 10000) = 1$

  • $P(X \leq 2000) = 0.8$

Estoup
Finally, depending on how important it is that $P(X \leq 2000) = 0.8$, you might want to consider the simplified case of $a = 1$ which is known as an Estoup distribution with pmf:

 f  =  1/(HarmonicNumber[n] x)

and which will yield:

  • $P(X \leq 10000) = 1$

  • $P(X \leq 2000) = 0.835584$


I'm struggling to plot that cdf in wolfram mathematica... do you know the syntax for that?

Sure. Here is the cdf in InputForm ... just copy and paste:

F = Piecewise[{{1, x >= n}, {(-HurwitzZeta[a, 1 + Floor[x]] + Zeta[a])/HarmonicNumber[n, a], 1 <= x < n}}, 0]

If $n$ is small, you can see that this has a stepped structure:

Plot[F /. {n -> 14, a -> 2}, {x, 0, 14}, PlotRange -> All]

enter image description here

And, for your case, with very large n, it appears continuous, but is of course actually discrete, if you were to zoom in:

Plot[F /. {n -> 10000, a -> 0.949588}, {x, -10, 11000}, PlotRange -> All]

enter image description here

Related Question