Power Analysis – Conducting t-Test Power Analysis for Unequal Size Groups

power lawsample-sizestatistical-powert-test

It's usually straightforward to do a Power Analysis to compute the minimum sample size, especially in R which is my preferred statistical computing environment.

However, I am being asked to conduct a Power Analysis that's a little different than anything I've done or which I can find reference to online. I'm wondering if what I'm being asked for is even possible/valid.

The project basically has two unequal groups of states and the hypothesis is that these two groups are significantly different in terms of an outcome variable (which is the duration of phone calls to customers). The "control" group consists of 40 states and produced about 2,500 observations. The "test" group has about 10 states and 500 observations.

Initially, I found group means + pooled standard deviation, which I used to calculate an Effect Size. Then I used a package called pwr in R and found that I needed a minimum sample size of about 135 observations per group, given .05 significance and .8 power.

However, they rejected my answer because they want one group to be much bigger than the other like it is now, and they are expecting either two different minimum numbers of observations per group or a minimum % of the population in terms of numbers of states or observations that have to go into their "test" group.

I see Power Analyses for two sample t-tests (i.e. the R function pwr.t2n.test), but I'd have to specify at least one of the sample sizes whereas they want me to tell them the minimal sample size for both groups (either as numbers or percentages) and this function doesn't reflect the differences in standard deviations for the two groups.

Is this possible or do I just tell them that's not how it works (i.e. the best I can do is tell them that given one of the sample sizes and a pooled standard deviation the second group has to be at least a certain size)?

Best Answer

You can do sample size calculations for unequal sample sizes.

For example, you can decide the n's are in some ratio (such as in proportion to the populations perhaps).

It's then possible to do power calculations (at the least you can simulate to obtain the power under any particular set of circumstances, whether or not you are able to do the algebra).

The problem is that it's relatively inefficient at finding differences compared to the same total number of observations at equal sample sizes.

Imagine you had a total sample of $n=n_1 + n_2$, with equal variance in the population and close to equal sample variance, and that your choice was between a 50-50 split and a 90-10 split ($n_1 = 0.5n$ vs $n_1=0.9n$).

The two-sample t-statistic is:

$t = \frac{\bar {X}_1 - \bar{X}_2}{s_{\text{pooled}} \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}$

The impact of the sample size is in the term $1/{\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}$.

If you have the 50-50 split it's like having a 40% smaller standard deviation; at a given $n_1+n_2$ you can pick up a substantially smaller effect with the even split.

If the combined sample size is not an effective constraint, this calculation may pointless however. It matters in cases where every observation carries the same marginal cost, which is not always relevant.

Related Solutions

Solved – Power analysis for post hoc test of ANOVA with many groups

Because there will be many more error degrees of freedom, you should see an increase in the $A$ vs $B$ rejections as well as $A$ or $B$ vs $C_i$ rejections, because observed differences of a given number of standard errors in size are much less likely to be due to noise in measuring the standard deviation.

For example, imagine that the common error variance, $\sigma^2=1$.

Then the distribution of the estimate of $\sigma^2$ is quite skewed (and spread out) when there's just $A$ and $B$, but as you add more $C$ groups you get a very much stronger idea of the variance, and this will on average improve your ability to tell A and B apart:

enter image description here

(This assumes half the groups have 2 observations and half have 3 observations)

That bulge in the left tail of the green density below 1 means you get large F's when $H_0$ is true quite often (because you're dividing by a small number more often). As a result, you need a big F to be confident that it's not just random variation.

That's why the 5% critical value for an F(2,3) (i.e. the A vs B alone comparison) is 9.55, while that for an F(2,150) (i.e. only considering A vs B with 98 "C" groups helping to determine $\sigma^2$) is 3.06.

That effect is part of why you don't need many observations per group.

You should further note that if the $C$ groups have population mean intermediate between the $A$ and $B$ groups, then you should reject the null because of B-C and A-C differences. You seem to think that shouldn't happen. That's simply untrue. It ought to happen (though much less often for any particular $A-C_i$ or $B-C_i$ than for $A-B$).

Simulation is a useful tool to see which rejections occur more often as you add groups.

I imagine that with many groups and only a few observations per group, A vs B rejections will eventually become a relatively small proportion of the total rejections, but it's only $C_j$ vs $C_k$ rejections that are incorrect decisions.

Solved – R power and sample size estimation

Given my comments under your post above:

It sounds to be like you are analyzing a 2 x 2 contingency table: Group A vs. Group B x Success vs. Failure. With these, you can easily calculate an odds ratio (OR), see metafor::escalc() for good documentation on getting an OR from a 2 x 2 contingency table.

I have used epiR::epi.ccsize() to do power analyses for odds ratios before in working with epidemiologists. It is geared toward epidemiologists, but the statistics are the same, and the code is very simple.

Let's say we are expecting an odds ratio of 1.5, where there is a 30% success rate in the control group and there is a 2:1 ratio of participants in the control versus experimental group (i.e., what you describe in your post), and we want 95% power:

epi.ccsize(OR=1.50, p0=.30, n=NA, power=.95, r=2)

Which gives us a list:

$n.total
[1] 1578

$n.case
[1] 526

$n.control
[1] 1052

Translating from epidemiologist-centric language, you need 526 experimental and 1052 controls to get 95% power in that situation.

It might also be tempting to try stats::power.prop.test(), but I'm not sure how to handle your 2:1 ratio using that function. For example, this response says that you just need to make sure your smallest group hits the threshold given by power.prop.test(), but I find that that estimate is unnecessarily high:

power.prop.test(p1=.30, p2=.391304, power=.95) # these values for p1 and p2 give OR of 1.50

     Two-sample comparison of proportions power calculation 

              n = 702.1545
             p1 = 0.3
             p2 = 0.391304
      sig.level = 0.05
          power = 0.95
    alternative = two.sided

NOTE: n is number in *each* group

This overestimate jibes well with the comment to the post I linked above, where user Underminer says:

"If you do a 95/5 split, then it'll just take longer to hit the minimum sample size for the variation that is getting the 5%." - while this is a conservative approach to at least satisfying the specified power of the test, you will in actuality be exceeding the specified power entered in power.prop.test if you have one "small" and on "large" group (e.g. n1 = 19746, n2 = 375174). A more exact method of meeting power requirements for unequal sample sizes would likely be desirable

Here's a relevant RPubs link using the pwr package, discussing unequal sample sizes. However, I find the most intuitive way to do this being the way using epiR.

Best Answer

Related Solutions

Solved – Power analysis for post hoc test of ANOVA with many groups

Solved – R power and sample size estimation

Related Question