Solved – Power calculation for two-sample Welch’s t test

rstatistical-powert-test

The R function power.t.test does power calculations (outputs power, sample size, effect size, or whichever parameter you leave out) for t-tests, but only has a single parameter for sample size. The pwr package has a function pwr.t2n.test that performes calculations for a two-sample t-test with different sample sizes (n1,n2). Finally, this suite of stats functions includes a function for Welch's t test (used for samples with different variances), but only includes one parameter for the sample sizes.

I have not been able to find a formula for calculating the power for Welch's t.

Could someone help me out with the formula or an R function for this?

Best Answer

Comment: First, I would suggest you consider carefully whether you have a really good reason to use different sample sizes. Especially if the smaller sample size is used for the group with the larger population, this is not an efficient design.

Second, you can use simulation to get the power for various scenarios. For example, if you use $n_1 = 20,\, \sigma_1 = 15,\,$ $n_2 = 50, \sigma_2 = 10,$ then you have about 75% power for detecting a difference $\delta = 10$ in population means with a Welch test at level 5%.

n1=20; n2=50;  sg1=15; sg2=10;  dlt=10
set.seed(619)
pv = replicate(10^6,
       t.test(rnorm(n1,0,sg1),rnorm(n2,dlt,sg2))$p.val)
mean(pv <= .05)
[1] 0.753043

Because the P-value is taken directly from the procedure t.test in R, results should be accurate to 2 or 3 places, but this style of simulation runs slowly (maybe 2 or 3 min.) with a million iterations.

You might want to use 10,000 iterations if you are doing repeated runs for various sample sizes, and then use a larger number of iterations to verify the power of the final design.

Changing to $n_2 = 20$ gives power 67%, so the extra 30 observations in Group 2 are not 'buying' you as much as you might hope. By contrast, a balanced design with $n_1 = n_2 = 35$ gives about 90% power (with everything else the same).

Best Answer

Related Solutions

Solved – Sample size / power calculation for logrank survival test

Solved – power to detect effect sizes for the usual t, Welch Adjusted t, and Mann-Whitney U test

Related Question