Solved – Error for combining multiple binomial distributions

binomial distributioncentral limit theoremerror-propagationexperiment-designhypothesis testing

This problem is somewhat involved and I have a partial solution so bear with me. I will illustrate the problem with an example. Lets say we have two processes and we want to know which has a higher success rate. So we do two sets of trials. process 1 gives 53 out of 606 and process 2 gives 32 out of 595. So the rates are

$p_1$ is 0.0538 with a 95% CL of [0.0371,0.0751]

$p_2$ is 0.0875 with a 95% CL of [0.0662,0.1128]

It would seem that $p_2$ has a higher success rate. However, both only represent one measurement of the success rate and in general these experiment are on different samples. To get the actual success rate we would need to perform multiple experiments of this type and then calculate the mean of the $p_1$s and $p_2$s. My question is how do I calculate the error on this mean? Is it calculated from the distribution of $p_1$s and $p_2$s or do I need to incorporate the confidence limit for the binomial distribution of each experiment in some way? I am happy to stay in a regime where we can make a nearly Gaussian approximation like with the numbers given above. Is this the sort of thing where one would argue from the central limit theorem that the distributions of the $p_1$s and $p_2$s are all that matter?

There is a standard way to combine several experiments if they can be assumed to have a standard error on the measurement. If the set of measurements is $a_i$ and the set of associated errors is $σ_i$, then the estimate for the true $a$ is given with accuracy $σ$ by the following:

$a = \frac{ \Sigma (a_i/σ_i^2)}{ \Sigma (1/σ_i^2)}$

$\frac{1}{σ^2} = \Sigma \frac {1}{σ_i^2}$

This just amounts to a weighted sum. The reason this does not work in general for a binomial experiment is that there is no way to calculate a σ for each measurement that makes sense. In low statistical cases like 1 success out of 4 the error is highly unsymmetrical.

However, if we are in a situation with high statistics like I gave above we can use something like the Wald interval. One can then only combine them confidently when each individual measurement is similar and derived from high statistics.

Any help is appreciated.

Best Answer

Can you assume that the success probability among the $n_1 = 606$ or $n_2 = 595$ process realizations are equal and that the realizations are independend? If not, you'll have a hard time to do any statistics at all.

If yes --and this has to be methodologically supported by the experiment-- you can model each realization $X_{ik}$ as a Bernoulli experiment with success probability $p_i$. In other terms, you can consider $X_{ik}$ as a random variable with $P(X_{ik} = 1) = p_i$ and $P(X_{ik} = 0) = 1-p_i$.

Now you observe that $Var(X_{ik}) = p_i(1-p_i)<\infty$, so as your realizations are independent, you can apply the central limit theorem on $\frac{1}{n_i}\sum_{k=1}^{n_i}X_{ik}=\hat{p_i}$.

As you wished, this boils your situation down to calculating a (Wald-)confidence interval for $p_1 - p_2$: The variance $Var(\hat{p_1} - \hat{p_2})=\sum_{i=1}^2\frac{p_i}{n_i}(1-p_i)$ can be estimated by plugging in $\hat{p_i}$ for $p_i$. So with $z \approx 1.96$ you get $$\hat{p_1} - \hat{p_2} \pm z \sqrt{\sum_{i=1}^2\frac{\hat{p}_i}{n_i}(1-\hat{p}_i)}$$ as confidence bounds for the difference.

Now if you have many such success rates and want to know which is best among all of them, you'll need simultaneous confidence intervals. (I'll elaborate on this if you need.)