Solved – power to detect effect sizes for the usual t, Welch Adjusted t, and Mann-Whitney U test

rstatistical-powert-testwilcoxon-mann-whitney-test

For class, I am given a series of sample sizes and variances for two groups and am asked to calculate the power for detecting effect sizes for the usual t test, Welch adjusted T test, and the Mann-Whitney U test.

I have to assume my two groups differ by an effect size of $0.25$ and the sample size of my group is $10$, with equal variance.

The question asks me to compute the power for these three tests.

I am wondering if I can use the power.t.test function for all three tests or how I can adjust my code to consider the different computational methods for the Welch adjusted t-test with var.equal=FALSE, and the Mann-Whitney U test.

This is my R code for the usual t-test:

N=10
power.t.test(N,d=.25, sd=1, sig.level=.05,type="two.sample", alternative="two.sided")

How do I get the power for the other two t tests?

Any help or guidance is much appreciated 🙂

Best Answer

The power.t.test function can only calculate power for the t-test. If you don't know how to compute power for the other tests, you'd use simulation - i.e. simulate from some given distribution under the conditions given.

You don't say what distribution you need to do it for; presumably at the normal (but you should check carefully).

So you repeat many times the action of simulating a pair of samples of size 10 with the given effect size and then compute whether each test rejects or not (or alternatively, record the p-values, which you later compare with the significance level).

You don't need to write functions to conduct each of the tests, since R already has functions that do all of those for you. And I'd suggest writing a function to simulate a single pair of samples under the required conditions and call each of the functions for the different tests, and then gather up only the information from each test you need (I would suggest getting the p-values) and then using replicate to call that function to do the simulations and allow you to save the results.)

You may not be required to do so, but it makes sense to also compute the actual type I error rate - the rejection rate at effect size 0, since neither the Mann-Whitney nor the Welch tests will not be carried out at exactly the nominal rate, but some other rate (if you're actually testing at 3.6% instead of 5% you would expect lower power, because the test is being conducted at a lower type I error rate).

[For the tests to be actually comparable, you should conduct them at the same rate. Indeed, ideally, you would probably treat the impact on power and significance level as separate issues, by finding the different actual significance levels and then either carrying them all out at as near to the same significance level as possible. This would either involve $\ $ (a) carrying out the t-test at the actual level of the Mann-Whitney and then adjusting the Welch nominal level so that it had approximately the same significance level, or $\ $ (b) using a randomized test to carry out the Mann-Whitney at a 5% level and (again) adjusting the nominal level of the Welch test so the actual significance level is close to 5%. I expect you're not required to do this though.]

I'd suggest a simulation size of at least 10000. You can calculate the standard error of the rejection rate estimate from the binomial distribution.