[Math] How many trials are needed to determine event probability to degree of accuracy

probabilitystatistics

For events that occur at fixed, unknown probabilities, it is sometimes desirable to determine the event probability. The question that arises is how many trials are necessary. The answer will depend on how accurate the event probability needs to be. It seems rather obvious that fewer trials would be needed to determine the probability within 1% than would be needed to determine the probability within 0.1%.

To give an example: Say you have an unbalanced coin, and you want to determine the probability of the coin producing heads.

You flip it 10 times and get 6 heads. That's 60%.

You flip it 100 times and get 47 heads. That's 47%.

You flip it 1000 times and get 488 heads. That's 48.8%.

How many times would you need to flip the coin so that you've calculated the probability of getting heads within 1% accuracy, within 0.1% accuracy?

Is there a statistical formula that can calculate this?

Best Answer

Start by looking at a bell curve.
Bell curve with standard deviations

A bell curve can be obtained by running a certain number of trials each consisting of a certain number of coin flips (or more generally, by observing how frequently an event occurs over a number of trials).

Let's say you perform 500 trials with 1000 coin flips in each trial. The center of the bell curve represents the average number of times that heads occurred over all trials. Call this $\mu$. Divide this by the number of coin flips in each trial to get the probability (p).

But wait, we don't know how many coin flips to do yet. Go back to the bell curve. On either side of $\mu$, the graph is broken into sections. Each section represents 1 standard deviation ($\sigma$).

What does the standard deviation tell us? For any given trial of n coin flips, our result has about a 68% chance of being within 1 $\sigma$ of the actual $\mu$, about a 95% chance of being within 2 $\sigma$ of the actual $\mu$, and about a 99.8% chance of being within 3 $\sigma$ of the actual $\mu$.

There is a formula that can be used to calculate $\sigma$.
$\sigma = \sqrt{n * p * (1 - p)}$

where n is the number of coin flips and p is the probability of getting heads.

If you do 100 flips of a fair coin where p = 0.5,
$\sigma = \sqrt{100 * 0.5 * (1 - 0.5)}$ = +/- 5

What does that number 5 represent? For n = 100 and p = 0.5, $\mu = 100 * 0.5 = 50$. That means one can be about 68% certain that for any trial of 100 flips, the number of heads will be 50 +/- 5. 95% certainty is 2 $\sigma$ or 50 +/- 10, while 99.8% certainly is 3 $\sigma$ or 50 +/- 15.

How does this help establish the accuracy of event probability?

$\sigma$ = +/- 5 for n = 100. This means that there is a 68% chance that 100 coin flips will yield a result within 10% of the actual probability. So the accuracy (A) of the result is within 5% with 68% certainty.

The equation for determining A becomes:
$A = \frac{\left|\sigma\right|}{n}$

but we know that
$\sigma = \sqrt{n * p * (1 - p)}$

so

$A = \frac{\left|\sqrt{n * p * (1 - p)}\right|}{n}$

or

$A^2 = \frac{n * p * (1 - p)}{n^2}$

Solving for n:

$n = \frac{p * (1 - p)}{A^2}$

Going back to the example of the unfair coin in the question's example, we use
p = 0.488
A = 0.01 for 1% accuracy
Then n = 2498.6 or about 2,500 coin flips would be needed to determine the probability of getting heads to within 1% accuracy with 68% certainty.

What if you wanted a higher level of certainty? For 95% certainty (2 $\sigma$), the equation becomes:
$n = \frac{4 * p * (1 - p)}{A^2}$

and for 99.8% certainty (3 $\sigma$), the equation is:
$n = \frac{9 * p * (1 - p)}{A^2}$