The technique for finding the probability of probability in a binomial distribution

binomial distribution

I know this is a tired question, but I have seen no clear answers.

I saw a related post but didn't see how it applied to the general case. In a 3b1b video, it is alluded to that given some known number of trials and successes, the probability of the probability in a binomial distribution can be calculated (or perhaps inferred, guessed at?). Using Baye's Theorem I find difficulty; the theorem relates two probabilities of two variables, but there are three variables here: the probability of a success (which we want to find), the number of trials and the number of successes.

E.g. if a car factory makes $100$ cars and finds $2$ defects, what is the most likely probability of a defect? How do we find the expected value of this thing?

By hand one could go through many test probabilities and see which has $2$ defects as its most likely outcome but that feels wrong – and would only work in the expected value case, anyway. The general probability of a probability is a mystery to me.

Many thanks to anyone who can even tell me what this sort of process is even called.

Best Answer

Let $p$ be the probability of success in one trial; this quantity is unknown to you. Given the outcome of $n$ trials ($n$ is non-random and fixed), you are interested in understanding

what does this information tells me about $p$?

In statistics there are broadly two frameworks for answering this question.

Frequentist statistics

In frequentist statistics, $p$ is a fixed, non-random, unknown quantity. Note that because this is a non-random quantity, it does not make sense to ask things like "what is the probability that $p$ is larger than $0.6$?"

One ubiquitous approach for answering the above question is maximum likelihood. This is almost what you described here, but not quite.

By hand one could go through many test probabilities and see which has 2 defects as its most likely outcome.

In your binomial scenario, if $X$ is the number of successes you observed in $n$ trials, and you observed $X=x$, then (as you described) you can compute $L(p) := \binom{n}{x} p^x(1-p)^{n-x}$ for many values of $p$ and find the $p$ that makes this quantity large. The function $L$ is called the likelihood function, and this procedure of finding the $p$ that makes $L$ the largest is called maximum likelihood estimation. It is important to note that $L$ is a function of $p$ (with $x$ fixed at whatever number of successes you observed), rather than a function of $x$ with $p$ fixed (this would be the PMF of the binomial distribution).

What you proposed is to find $p$ such that $2$ is the mode of the $\text{Binomial}(n, p)$ distribution (i.e. $P(X=2) \ge P(X=k)$ for all $k \ne 2$). This is not quite the same as the above.

Bayesian statistics

In Bayesian statistics, $p$ is a random quantity with its own distribution, called a prior distribution. In your binomial example, $p$ could follow a $\text{Uniform}[0, 1]$ distribution. It could also follow some other distribution, depending on your prior belief in what $p$ could be. The prior distribution must be chosen by you, but as you get more and more data, the influence of this initial choice becomes less and less as you rely more and more on the data.

Since $p$ is random in this framework, it now makes sense to ask questions like "what is the probability that $p$ is larger than $0.6$?"

Now that $p$ is random, the model is now that $X$ conditioned on $p$ is binomial, e.g. $P(X=x \mid p=0.2) = \binom{n}{x} (0.2)^x (0.8)^{n-x}$.

To understand what your data $X=x$ says about the unknown $p$, you can consider the posterior distribution of $p$, that is, the conditional distribution of $p$ given $X=x$. This can be computed using Bayes's Rule. Here is an example of the posterior probability that $p=0.4$ in the case where $p$ follows a discrete distribution.

$$f(0.4) = P(p=0.4 \mid X=x) = \frac{P(X=x \mid p=0.4) P(p=0.4)}{\sum_{p'} P(X=x \mid p=p') P(p=p')}.$$

One Bayesian approach to answer the question is Maximum A Posteriori estimation, which chooses $p$ to maximize the above function $f$.

Related Question