What would be the equation to describe a set of 10,000 bugs if you already know they have a pareto distribution.
In other words 2,000 of the bugs would equal 80% of all your problems.
I'm struggling to make the pareto and/or Zipf equation fit this model… It should be relatively simple.
I would like to make a simple "progress bar" type graph that will use this equation.
Best Answer
A discrete Pareto distribution is normally known as the Riemann Zeta distribution which is ill-suited to your problem because it is unbounded above … whereas you seek a finite upper bound of $n=10000$.
Your other suggestion of a Zipf distribution seems more appropriate as it has a finite upper bound at $n$ elements, so let us use that. I am using the mathStatica/Mathematica combo here …
So, let $X \sim \text{Zipf}(n,a)$ with pmf $f(x)$:
The cdf, $P(X \leq x)$, is:
You wish to find the value of parameter 'a' such that $P(X \leq 2000) = 0.8$, given $n=10000$. Solving this numerically yields:
So that's it … a $\text{Zipf}(n,a)$ pmf with $n = 10000$ and $a = 0.949588$, which will yield you:
$P(X \leq 10000) = 1$
$P(X \leq 2000) = 0.8$
Estoup
Finally, depending on how important it is that $P(X \leq 2000) = 0.8$, you might want to consider the simplified case of $a = 1$ which is known as an Estoup distribution with pmf:
and which will yield:
$P(X \leq 10000) = 1$
$P(X \leq 2000) = 0.835584$
Sure. Here is the cdf in InputForm ... just copy and paste:
If $n$ is small, you can see that this has a stepped structure:
And, for your case, with very large n, it appears continuous, but is of course actually discrete, if you were to zoom in: