Binomial Distributions – Testing Statistical Differences Between Two Binomial Distributions

bernoulli-distributionbinomial distributionstatistical significance

I have three groups of data, each with a binomial distribution (i.e. each group has elements that are either success or failure). I do not have a predicted probability of success, but instead can only rely on the success rate of each as an approximation for the true success rate. I have only found this question, which is close but does not seem to exactly deal with the this scenario.

To simplify down the test, let's just say that I have 2 groups (3 can be extended from this base case).

Group Trials $n_i$ Successes $k_i$ Percentage $p_i$
Group 1 2455 1556 63.4%
Group 2 2730 1671 61.2%

I don't have an expected success probability, only what I know from the samples.

The success rate of each of the sample is fairly close. However my sample sizes are also quite large. If I check the CDF of the binomial distribution to see how different it is from the first (where I'm assuming the first is the null test) I get a very small probability that the second could be achieved.

In Excel:

1-BINOM.DIST(1556,2455,61.2%,TRUE) = 0.012

However, this does not take into account any variance of the first result, it just assumes the first result is the test probability.

Is there a better way to test if these two samples of data are actually statistically different from one another?

Best Answer

The solution is a simple google away: http://en.wikipedia.org/wiki/Statistical_hypothesis_testing

So you would like to test the following null hypothesis against the given alternative

$H_0:p_1=p_2$ versus $H_A:p_1\neq p_2$

So you just need to calculate the test statistic which is

$$z=\frac{\hat p_1-\hat p_2}{\sqrt{\hat p(1-\hat p)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}$$

where $\hat p=\frac{n_1\hat p_1+n_2\hat p_2}{n_1+n_2}$.

So now, in your problem, $\hat p_1=.634$, $\hat p_2=.612$, $n_1=2455$ and $n_2=2730.$

Once you calculate the test statistic, you just need to calculate the corresponding critical region value to compare your test statistic too. For example, if you are testing this hypothesis at the 95% confidence level then you need to compare the absolute value of your test statistic against the critical region value of $z_{\alpha/2}=1.96$ (for this two tailed test).

Now, if $|z|>z_{\alpha/2}$ then you may reject the null hypothesis, otherwise you must fail to reject the null hypothesis.

Well this solution works for the case when you are comparing two groups, but it does not generalize to the case where you want to compare 3 groups.

You could however use a Chi Squared test to test if all three groups have equal proportions as suggested by @Eric in his comment above: " Does this question help? stats.stackexchange.com/questions/25299/ … – Eric"

Related Question