Solved – Does two-way ANOVA still works fine with unequal sample sizes

anovasample-sizetwo-wayvariance

I want to test if installing a new technology (with two different versions) can reduce water waste. I'll test the following: no installation (control), installation of new technology (version 1), and installation of new technology (version 2). Due to cost reasons, I will only be able to test this technology to a total of 210 participants, in which 90 are in city $c_1$, 64 in $c_2$, 45 in city $c_3$ and 11 in city $c_4$. So, I have two factors: the first one for the technology (size=3) and the second one for the geographic location (size=4), thus why I want to use two-way ANOVA. In these conditions, I should have $3 \times 4=12$ groups. However, since there are different sample sizes for each geographic location, will this affect any of my results?

Edit: I guess one way to avoid unequal sample sizes is to ignore the geographic locations by doing the following: Considering the 210 participants as one single test group and make another group with 210 participants (for control) but allocate them accordingly to the test group, i.e, also 90 in $c_1$, 64 in $c_2$, 45 in $c_3$, and 11 in $c_4$. By doing this, is the geographic location no longer a problem? If so, I guess I could run a one-way ANOVA now.

A "similar" question was done in Two-way ANOVA with unequal sample size, but equal variances but it assumed equal variances and still has no accepted answer.

Best Answer

I found a great article here that tackles this issue in a fairly straightforward way. Here's a quote from the article that I think sums up where you're going to have problems:

The main practical issue in one-way ANOVA is that unequal sample sizes affect the robustness of the equal variance assumption.

ANOVA is considered robust to moderate departures from this assumption. But that’s not true when the sample sizes are very different. According to Keppel (1993), there is no good rule of thumb for how unequal the sample sizes need to be for heterogeneity of variance to be a problem.

So if you have equal variances in your groups and unequal sample sizes, no problem. If you have unequal variances and equal sample sizes, no problem.

The only problem is if you have unequal variances and unequal sample sizes.

Check out the full articles for some more detail, and I think it will answer your question. One thing I'll add is that based on my experience, the "upper-bound" for what constitutes a acceptable deviance from equal sample size is about half. So if city 1 has 90 measures, when I see other cities less than 45, red flags go up. One reason this red flag go's up is because its very difficult to know what the variance of a group of 11 measurements is. So not only do we have unequal sample size, but we are unsure if our variance is equal.

With all that said, if I were you, I would run the experiment, and perform an ANOVA using JUST cities 1-3, as (assuming the variance is more or less equal) you've got large enough and similar enough sample sizes that the ANOVA should be acceptable. In city 4, go ahead and run the experiment, but know you probably can't run an ANOVA. However, this is real life and not a statistics course, so you may find useful information. Maybe you can group city 4 with city 3 because they're close by; maybe the results in city 4 are very clearly different than other cities and based on your expert opinion you believe that you can go ahead and take whatever action comes from this study.

Unfortunately, real world statistics is sometimes messy, and you may need to rely on your subjective judgement in some cases.

Related Question