Solved – Chi square test when sample sizes are different

chi-squared-test

I have two samples from two different language corpora: Sample one contains 82 verbs, sample 2 contains 89 verbs. I want to compare the frequencies of a particular verb type, let's call them oral verbs, across both samples and see if they differ significantly from each other (I would have used another verb type in which I don't expect differences as a comparison group for a 4-cell chi square test). Originally, I wanted to do a chi square test but then realized that wouldn't be possible given the different sample sizes. Which test might I be able to apply?
Thank you!

Best Answer

You can use a chi-squared test in your example with different sample sizes. Your "another verb type" would be verbs that are not oral verbs, i.e. all the other verbs

Suppose in your example, $10$ of the $82$ verbs in sample one were oral verbs and $72$ were not, while $20$ of the $89$ verbs in sample two were oral verbs and $69$ were not. Then the table for your four cell chi-squared test could look like

10  72  |  82
20  69  |  89
__ ___    ___
        |
30 141  | 171

and in R you might get

chisq.test(rbind(c(10, 72), c(20, 69)))

#     Pearson's Chi-squared test with Yates' continuity correction
#
# data:  rbind(c(10, 72), c(20, 69))
# X-squared = 2.4459, df = 1, p-value = 0.1178

so this example would not be statistically significant