[Math] What distribution models number of trials needed for given number of successes and success rate

binomial distributionprobability distributionsstatistics

Case scenario: a retro-virus infects a healthy cell. The virus programs the cell to brew little viruses, at a rate of 0.5 per-sec, until finally the cell bursts when the number of virus inside it is 5. How to model this?

In Binomial, the random variable represents the number of successful trials obtained when throwing a coin a certain number of trials, at a certain probability of success per trial.

I want a distribution whose random variable is the number of trials (coin tosses) that were necessary to perform, given a certain number of successful trials and a certain probability per trial.

I am not even sure how I would write down the probability mass function.

Is there such a distribution? Nothing rings a bell here: https://en.wikipedia.org/wiki/Category:Discrete_distributions

There are related questions to this one — such as this one: How many trials until I each my desired outcome — but no-one mentioned a distribution, or if any exists.

Just to make it clear, the random generator for a random variable of such a distribution would look like this in R:

rmy <- function(s, p) {
    i <- n <- 0
    while(i != s) {
       i <- i+rbinom(1,1,p)
       n <- n+1
    }
    n
}

Thank you ! ps: sorry if the text was a little flowery, but it helps me think, since I am a junior mathematician hehe.

Best Answer

I figure this one out. :)

I can model it using a Negative Binomial: https://en.wikipedia.org/wiki/Negative_binomial_distribution

First, let us change the values of my case scenario, just to make it clearer. "Case scenario: a retro-virus infects a healthy cell. The virus programs the cell to brew little viruses, at a rate of 0.2 per-sec, until finally the cell bursts when the number of virus inside it is 5. How to model this?"

We can model number of failures $Y$ as $Y\sim\mathcal{NB}(5,0.2)$. That answers the question, how many failed trails do we have, when we need 5 successful at a probability rate of 0.2. But we do not want failed trials, we want total trials, and total trials = failed trials + successful trials. We know successful trials, which is 5, so our random variable $X$ is such that $X\sim5+\mathcal{NB}(5,0.2)$.

In fact, comparing the random generator function I proposed in the question with the negative binomial random generator (with this adjustment):

par(mfrow=c(1,2))
hist(sapply(1:1e5, function(x) rmy(5, 0.2)))
hist(5+rnbinom(1e5, 5, 0.2))

testing distribution

All functions mean, sd and summary are consistent as well.

Related Question