Solved – Finding statistical significance between two binary tests

binary datastatistical significance

I have two sets of data. One with a sample size of 82 with 53 "hits" and 29 "Misses." And a second sample of 105 with 67 "hits" and 38 "misses"

Given that the second set of data is a control, is there a way to show that the results of the first data set are or are not statistically significant?

Thanks in advance, It's been a while since I took a stats course and googling for the answer left me confused.

Best Answer

I would start by stating things more formally, e.g. (with no prior information):

The null hypothesis is that these two samples come from the same population. The alternative is that they come from different populations where one really has more "hits".

A simple and traditional approach would be Fisher's exact test.

Wiki should allow you to calculate it manually for your data.

If using R you could do something like this:

> cm1 <- matrix(c(53, 67, 29, 38),
                nrow = 2,
                dimnames = list(c("first", "control"), c("hits", "misses"))) 
> addmargins(as.matrix(cm1), c(1,2))
        hits misses Sum
first     53     29  82
control   67     38 105
Sum      120     67 187
> fisher.test(cm1)

Gives p=1 meaning probability of your data given the null is very high (suggests that first group v. unlikely to be different from control).

You can also do

fisher.test(cm1, alternative="greater")

meaning

The null hypothesis is that these two samples come from the same population. The alternative is that the first comes from a different population which has more "hits" than the second.

... but it's still p=0.52.

Now there are other methods but most will accept that it's unlikely that there's any great difference between 53/82 = 65% and 67/105 = 64%

Related Question