Solved – Power analysis for post hoc test of ANOVA with many groups

anovagroup-differencespost-hocstatistical-power

In both G*Power and the R pwr package, estimated sample size required per group decreases as the number of groups increase. This seems somewhat counter-intuitive.

For a toy example, assume that I have two groups with a meaningful difference in their mean estimates (Group A and Group B). If I add several additional groups that have means identical to the grand mean of both groups (Groups C1, C2, C3, …), the power analysis suggests smaller samples from Group A and Group B are needed — which should make my ability to detect differences in those two groups weaker. At an extreme, if I enter 1500 levels of a single factor (f = .25, b = .8, a = .05), both programs effectively tell me to have group sizes of 2-3.

My understanding is that the power analyses from both programs helps you assess the power of the ANOVA overall. Thus, in the toy example, I'm more likely to pick up a difference between Group A or B and one of the Group Cs . However, this seems like it's a result of an increase in the number of comparisons and the likelihood that some of the Group C samples include mean estimates that are outliers. That doesn't seem like the type of difference I want to pick up.

What is the recommended approach in these circumstances — or is the "low" sample size per group correct? Since I'm concerned with post-hoc comparisons of the group means, are there a priori power analyses available for those tests?

Best Answer

Because there will be many more error degrees of freedom, you should see an increase in the $A$ vs $B$ rejections as well as $A$ or $B$ vs $C_i$ rejections, because observed differences of a given number of standard errors in size are much less likely to be due to noise in measuring the standard deviation.

For example, imagine that the common error variance, $\sigma^2=1$.

Then the distribution of the estimate of $\sigma^2$ is quite skewed (and spread out) when there's just $A$ and $B$, but as you add more $C$ groups you get a very much stronger idea of the variance, and this will on average improve your ability to tell A and B apart:

enter image description here

(This assumes half the groups have 2 observations and half have 3 observations)

That bulge in the left tail of the green density below 1 means you get large F's when $H_0$ is true quite often (because you're dividing by a small number more often). As a result, you need a big F to be confident that it's not just random variation.

That's why the 5% critical value for an F(2,3) (i.e. the A vs B alone comparison) is 9.55, while that for an F(2,150) (i.e. only considering A vs B with 98 "C" groups helping to determine $\sigma^2$) is 3.06.

That effect is part of why you don't need many observations per group.


You should further note that if the $C$ groups have population mean intermediate between the $A$ and $B$ groups, then you should reject the null because of B-C and A-C differences. You seem to think that shouldn't happen. That's simply untrue. It ought to happen (though much less often for any particular $A-C_i$ or $B-C_i$ than for $A-B$).


Simulation is a useful tool to see which rejections occur more often as you add groups.

I imagine that with many groups and only a few observations per group, A vs B rejections will eventually become a relatively small proportion of the total rejections, but it's only $C_j$ vs $C_k$ rejections that are incorrect decisions.

Related Question