Solved – Chi-square with unbalanced design

chi-squared-test

I am a Ph.D. candidate of Applied Linguistics. Currently I am involved in a research study about English teachers' provision of corrective feedback on English L2 learners' errors. I have divided the teacher participants into two groups.

I want to juxtapose the groups in terms of the number (frequency) of corrections they supply in response to the learners' errors. I suppose I should apply chi-square test.
However, as the group sizes are different (10 teachers in group A and 15 teachers in group B), I am sort of uncertain about the applicability of chi-square test.

I want to know if there is any statistical test that is capable of dealing with frequency data while not impacted by the lack of equality in group size.

Best Answer

Chi-square makes no assumptions about equality of group sizes.

The correction rates for the two groups can be compared (and indeed, different amounts of work per teacher within each group can be dealt with by the use of exposures, so if the A group marked twice as much work each as the B group that would also be fine).

Am I right to assume the groups are looking at the work of different students, rather than the same pool of students being marked twice?

I'd be inclined to use Poisson regression (where, for example, the model can be elaborated relatively easily, if required), but if you condition on the total number of corrections it would become a binomial test of a known proportion, which can also be done as a chi-square.

It would be good to explain what the underlying aim is more clearly, without using words like 'test', 'chisquare' or 'design' - you say 'juxtapose' - but that simply means to place unlike things together, which suggests you need a table. What do you want to find out about and why would hypothesis tests answer your underlying questions of interest?

---

Example of how to do the binomial / chi-square calculation:

Possible objection: Assumes the groups are internally homogeneous (i.e. there's no variability in the underlying rate of corrections within group - the observed variation is due to random variation around the shared level). (Other assumptions, like independence, are probably uncontroversial.)

Say the correction counts - on the same set of items, but different students - are as follows:

 A: 27 30 32 34 40 30 24 30 32 19 43 31 29 27 23    total: 451

 B: 32 50 43 37 39 39 38 47 31 38                   total: 394

If the rate of correction is the same for both groups, the total number of corrections should be proportional to the number of teachers.

That is, the sum of the A sample is expected to be a fraction 15/(10+15) (=60%) of the overall number of corrections. The total number of corrections across all teachers is 845.

The expected number of corrections in group A is 845 x 0.6 = 507, and in group B is 845 x 0.4 = 338.

The chisquare (for my made up data!) is

$$(451 - 507)^2/507 + (394 - 338)^2/338 = 15.46$$. The d.f. is 1.

As a binomial, we just test that the A proportion is 60%:

The observed total count in A is binomial(n=845,p=0.6); with a two-tailed test, we could use the normal approximation to the binomial proportion and get:

$Z = \frac{451/845 - 0.6}{\sqrt{0.6 (1-0.6)/845}} = -3.932$

(the square of this Z is the chi=square value above; its two-tailed p-value is the same as the p-value for the chi-square)

The exact binomial calculation is also quite readily done, but I won't labor the point.

---

A more complicated - but more defensible - analysis would be to fit a mixed logistic model, where 'teacher' is a random effect. This would allow for the fact that teachers have individual variation in their correction rate.

Related Question