Solved – How to calculate the probability for multiple trials with different probabilities

distributionsprobability

So for a bit of recreational spreadsheet creation I've been pondering probabilities.

Let's say I have two types of trial, A and B, with a 40% and 25% chance of success respectively. How can I calculate the probability, for instance, that I get at least two successes if I run trial A three times and B twice?

As I understand it, I can use a binomial distribution to calculate each one individually: 35.2% for at least two successes from A, or 6.25% from B. In addition, I would reckon the probability of getting at least one success in the combined set to be 87.85%, as I calculate a 12.15% chance of getting exactly zero successes on all five trials.

I'd greatly appreciate a little help working this out, because it's been bothering me for a couple of days.

Best Answer

Suppose you conduct the first trial $n$ times and (independently) second one $m$ times, and that the chances of success are $p$ and $q$ respectively. Let $A$ be the total number of successes in the first instance, $B$ the total in the second, and $X=A+B$ be the total number of successes. Obviously $X$ is an integer between $0$ and $m+n$ (inclusive). For any such integer $x,$ let's find an expression for the chance that $X=x.$

One such expression exploits the probability axiom that says the chance of an event is the sum of the chances of mutually disjoint events of which it is comprised. Here, the event $X=x$ is comprised of the events $A=a, B=x-a$ where $a$ ranges over all possible counts (of successes of $A$).

The independence of $A$ and $B$ implies the chance an event $A=a,B=x-a$ is the product of the component chances. Since $A$ and $B$ have Binomial distributions, we have immediately

$$\Pr(A=a,B=x-a) = \left(\binom{n}{a} p^a(1-p)^{n-a}\right)\left(\binom{m}{x-a} q^{x-a}(1-q)^{m-(x-a)}\right).$$

Summing these over $a$ and doing a little algebraic simplification yields

$$\eqalign{\Pr(X=x) &= \frac{(1-p)^n q^x}{(1-q)^{m-x}}\,\sum_{a=0}^x \binom{n}{a}\binom{m}{x-a} \left(\frac{p(1-q)}{(1-p)q}\right)^a \\ &= \phi^x \frac{(1-p)^n }{(1-q)^{m}}\,\sum_{a=0}^x \binom{n}{a}\binom{m}{x-a} t^a \\ &= \phi^x \frac{(1-p)^n }{(1-q)^{m}}\binom{m}{x}\,\sum_{a=0}^\infty \frac{(-n)^a (-x)^a }{a! (m-x+1)^a} (-1)^{2a}t^a \\ &= \phi^x \frac{(1-p)^n }{(1-q)^{m}}\binom{m}{x}\,_2F_1(-n,-x;m-x+1;t) }$$

where

$$\phi = \frac{q}{1-q}$$

is the odds for $B,$

$$t = \frac{p}{1-p}\,/\,\frac{q}{1-q}$$

is the odds ratio for $A$ relative to $B,$

$$(z)^s = z(z+1)\cdots(z+s-1)$$

is the "rising factorial" (or Pochhammer symbol), and $\,_2F_1$ is the Riemann hypergeometric function (which, in this case, obviously reduces to a polynomial in $t$ of degree no greater than $x$).

Find the chance of the event $X\ge x$ (as in the question) by summing over the individual possibilities of $x$ or, when $x$ is small, by computing the chance of its complement,

$$\Pr(X \ge x) = 1 - \Pr(X \lt x) = 1 - \sum_{y=0}^{x-1} \Pr(X = y).$$

For tiny values of $x$ this won't be too bad; for larger values, you will want a good software library for computing values of the hypergeometric function.

Remarks

Convolution of the two binomial distributions (using the Fast Fourier Transform) is an attractive option for precise calculation.

When both of $np+mq$ and $n(1-p)+m(1-q)$ are not small (exceeding $5$ is often considered ok), the Normal approximation to the Binomial distributions will give a good approximation. Specifically, the approximating Normal distribution will have mean

$$\mu= np + mq,$$

variance

$$\sigma^2 = np(1-p) + mq(1-q),$$

and the chance is therefore approximated (using a continuity correction) by

$$\Pr(X \ge x) \approx \Phi\left(\frac{\mu - x + 1/2}{\sigma}\right)$$

where $\Phi$ is the CDF of the standard Normal distribution. If you're brave, you can also approximate the individual probabilities as

$$\eqalign{ \Pr(X = x) &= \Pr(X \ge x) - \Pr(X \ge x+1) \\ &\approx \Phi\left(\frac{\mu-x+1/2}{\sigma}\right) - \Phi\left(\frac{\mu-x-1/2}{\sigma}\right).}$$

As an example, with $n=6,$ $p=0.40,$ $m=10,$ and $q=0.25$ (the chances in the question, with the minimal numbers of trials for the approximation to hold), a simulation of 100,000 values of $X$ (shown by the line heights) is pretty well reproduced by the approximation (shown by the dots):

This R code produced the figure.

n <- 6
m <- 10
p <- 0.4
q <- 0.25
#
# Simulate X.
#
n.sim <- 1e5
A <- rbinom(n.sim, n, p)
B <- rbinom(n.sim, m, q)
X <- A+B
#
# Plot the simulation.
#
plot(0:(n+m), tabulate(X+1, n+m+1)/n.sim, type="h", ylab="Relative frequency", xlab="x")
#
# Plot the Normal approximation.
#
f <- function(x, n, p, m, q) {
  mu <- n * p + m * q
  sigma <- sqrt(n * p * (1-p) + m * q * (1-q))
  pnorm((x + 1/2 - mu) / sigma) - pnorm((x-1 + 1/2 - mu) / sigma)
}
points(0:(n+m), f(0:(n+m), n, p, m, q), pch=21, bg="#e0000080")

Related Solutions

Solved – Calculate the cumulative probability of multiple different probabilities P(1..k) with n trials where n > k

In general, you have to compute these multinomial probabilities individually and add them.

Consider a die with $d$ sides, each appearing with probability $p_i, i=1, 2, \ldots, d$. The experiment consists of rolling this die $n$ times independently and recording the number of times $x_i$ that each face $i$ appears.

Suppose you want to find the chance that the $x_i$ simultaneously are one of the counts in a specified set $S_i$. For instance, the problem asks for the chance that $x_4\in S_4=\{2,3,\ldots,10\}$ (two or more fours), $x_5\in S_5=\{2,3,\ldots,10\}$ (two or more fives), $x_6\in S_6=\{1,2,\ldots,10\}$ (one or more sixes), and $x_1,x_2,x_3$ may have any value in $S_1=S_2=S_3=\{0,1,\ldots, 10\}$.

Let's abstractly name this chance $f(n, \mathbf{p}, \mathbf{S})$ (where $\mathbf p = (p_1,p_2, \ldots, p_d)$ and $\mathbf{S} = (S_1, S_2, \ldots, S_d)$).

The outcomes have a multinomial distribution. Thus

$$f(n, \mathbf{p}, \mathbf{S}) = \sum_{a_i\in S_i; a_1+\cdots+a_d=n} \binom{n}{a_1,a_2,\ldots,a_d}p_1^{a_1} p_2^{a_2} \cdots p_d^{a_d}$$

where the multinomial coefficient can be computed as

$$\binom{n}{a_1,a_2,\ldots,a_d} = \frac{n!}{a_1!a_2!\cdots a_d!}.$$

Evaluate this sum one index at a time. For instance, beginning with the last index $a_d$ we obtain

$$\eqalign{ f(n, \mathbf{p}, \mathbf{S}) &= \sum_{a_d\in S_d} \binom{n}{a_d} p_d^{a_d}\sum_{a_i\in S_i; a_1+\cdots+a_{d-1}=n-a_d} \binom{n-a_d}{a_1,a_2,\ldots,a_{d-1}}p_1^{a_1} p_2^{a_2} \cdots p_{d-1}^{a_{d-1}}\\ &= \sum_{a_d\in S_d} \binom{n}{a_d} p_d^{a_d} f(n-a_d, \mathbf{p}^\prime, \mathbf{S}^\prime) }$$

where the primes indicate the last element has been deleted from each vector. This is a recursive program. It gets started with the case $n=1$, where

$$f(n, p, S) = \left\{\matrix{p^n & \text{if } \in S\\ 0 & \text{otherwise.}}\right.$$

This is potentially a time-consuming calculation, because when the $S_i$ are numerous, there may be a great many cases to examine. For small $d$ and (depending on $d$) small $n$, it's practicable--the example in the question takes no measurable time. For large $d$, $n$, and numerous $S_i$, a cleverer algorithm is needed to screen out cases that contribute so little to the answer that their computation can be ignored.

Because the coefficients $\binom{n}{a_d}$ can be very large and the powers $p_d^{a_d}$ very small, it's a good idea to sum logarithms rather than perform these multiplications directly. That is the method implemented in the code below.

For the example described in the question, the result is $10\,593\,330/6^{10} \approx 0.1751943$. The following R code computed it. It includes a simulation afterwards to run the experiment a million times, and then compares the frequency with which $\mathbf x$ was found in $\mathbf S$ to the value previously computed. A Z-score is also returned: typically, a Z-score less than $2.5$ or so in size is significant confirmation, especially when the simulation is large enough to have observed this event many times. Here is its output:

Theory Simulated         Z 
0.1752    0.1752    0.0255

The simulation agreed almost exactly with the computed value.

# Return the logarithm of the chance that a die described by probability
# vector `p`, after `n` independent rolls, will exhibit face counts given
# by the numbers in the (nonempty) list `s`.
#
f <- function(n, p, s) {
  if (length(s)==1) return(ifelse(n %in% s[[1]], n * log(p[1]), -Inf)) # Base case

  a <- s[[1]]
  a <- a[0 <= a & a <= n]
  if (length(a)==0) return(-Inf) # There's nothing satisfying the criteria

  z <-  a * log(p[1]) + lchoose(n, a) + sapply(a, function(a) f(n-a, p[-1], s[-1]))
  return(log(sum(exp(z))))
}
#
# Examples and testing.
#
n <- 10
d <- 6
p <- rep(1/d, d)
s <- list(0:n, 0:n, 0:n, 2:n, 2:n, 1:n)
# s <- lapply(p, function(i) sample(0:n, ceiling(0.9*n))) # For testing
theory <- exp(f(n, p, s))
#
# Simulation.
#
n.sample <- 1e6
set.seed(17)
x <- rmultinom(n.sample, n, p)
e <- paste(sapply(1:length(s), function(i) paste0("x[", i, ", ] %in% s[[", i, "]]")), 
           collapse=" & ")
s <- eval(parse(text=paste0("mean(", e, ")")))
round(c(Theory=theory, Simulated=s, 
        Z=(s-theory) / sqrt(theory*(1-theory)/n.sample)), ceiling(log10(n.sample)/2+1))

Probability Concepts – Understanding ‘More Likely’ or ‘Less Likely’ in Probability

Is this a correct interpretation of what is meant by more or less likely in probability? Can someone verify this for me?

In short, no, this is not a correct interpretation. Properly used, the word "likely" should refer to the likelihood function: $$\mathcal L(\theta|x) = p(X=x|\theta)$$ where $X$ is a random variable with distribution $p$ which is parameterized by some parameter $\theta$. So the likelihood is a function of the distribution parameter $\theta$, not the random variable $X$ or the specific observation $x$.

For the specific example there is not a distribution with a parameter, so I think that the example is not using "likely" in its technical sense. I think that here they are using it simply as a synonym for "probable". So in that sense $A$ is more likely (meaning "probable") than any of the other values.

The issue that you are running into is not in the meaning of the word "likely" or "probable" but in the use of the words "more" or "less". The words "more" or "less" imply a comparison between two things. So saying that "$A$ is more probable" is incomplete unless you specify what it is more probable than.

$A$ is more probable than $B$. $A$ is more probable than $C$. $A$ is more probable than $D$. $A$ is less probable than ($B$ or $C$ or $D$). $A$ is less probable than (not $A$). All of those are standard probability statements and apply regardless of the number of trials or observations being made.

Best Answer

Remarks

Related Solutions

Solved – Calculate the cumulative probability of multiple different probabilities P(1..k) with n trials where n > k

Probability Concepts – Understanding ‘More Likely’ or ‘Less Likely’ in Probability

Related Question