Desired specification for algorithm:
Given an input array of positive integers $x$, with length $n$, return an integer $i \in \{0, \dots, n-1 \}$ according to the probability distribution:
$$P(i)=\frac{x_i}{\operatorname{sum}(x)}$$
First, I should note that many libraries have built in functions that do that, such as the numpy.random.choice
function in Python's NumPy library.
numpy.random.choice(np.arange(len(x)), p=x/numpy.sum(x))
will meet the specification. The keyword argument p
is for passing in the desired probabilities.
To code an algorithm from 'scratch', given only some random number generator that can generate a random integer from some range, with uniform probability, one possible algorithm is:
Given an input array of positive integers $x$
Let $c$ be the array of cumulative sums.
Let $s=\operatorname{sum}(x)$.
Choose an integer $r$ with uniform probability from 0 to $s-1$, inclusive.
Let $i$ be the count of the entries in $c$ that are less than or equal to $r$.
Return $i$.
This would return an index $i$ from 0 to $\operatorname{length}(x)-1$ in such a way that the probability is given by: $$P(i)=\frac{x_i}{\operatorname{sum}(x)}$$
For example, if the input array is $x=[2,6,9,3,1,3]$, then $c=[2,8,17,20,21,24]$ and $s=24$.
The algorithm would choose an integer $r$ with uniform probability from 0 to 23, inclusive.
If $r$ is 0 or 1, then $i=0$. (probability 2/24)
If $r$ is 2,3,4,5,6 or 7, then $i=1$. (probability 6/24)
If $r$ is 8,9,10,11,12,13,14,15 or 16, then $i=2$. (probability 9/24)
If $r$ is 17,18 or 19, then $i=3$. (probability 3/24)
If $r$ is 20, then $i=4$. (probability 1/24)
If $r$ is 21,22 or 23, then $i=5$. (probability 3/24)
The intuition for the algorithm comes from the concept of inverse transform sampling in probability theory. Given a probability distribution in the form of a cumulative distribution function $F(x)$, you can generate random variables according to that distribution by generating a uniform random variable $u$ from 0 to 1, then choosing the smallest $x$ such that $F(x) \ge u$.
For this problem, in which the distribution is over a finite set of values, the probabilities are the values in the array $x$, divided by $\operatorname{sum}(x)$ so that the total probability is 1.
The cumulative distribution function is just the cumulative sum of those probabilities.
If we use our RNG to generate a random integer $r$ from 0 to $\operatorname{sum}(x)-1$, then it already chooses each integer in that range with probability $1/\operatorname{sum}(x)$, so we don't need to divide by $\operatorname{sum}(x)$. We just need to find the smallest index $i$ such that $F(x_i)\ge r/\operatorname{sum}(x)$
We can actually find that index by counting the number of entries in the $c$ which are greater than or equal to $r$. If $i$ entries in $c$ are greater than or equal to $r$, then $F(x_i)\ge r/\operatorname{sum}(x)$ but $F(x_{i-1})< r/\operatorname{sum}(x)$ (or $i=0$)
Best Answer
An algorithm is called algorithm because it always converges. You will see in this simple function $y$ is always decreasing and eventually obtains zero and then the algorithm terminates. In this case,
To analyse the complexity of the algorithm, note that y reaches its half each iteration so it belongs to $O(\log y)$.