I usually turn to simulation for power calculations for the Wilcoxon sign-rank test. I have my own function I use for this. Use it at your own risk, as I don't know that anyone has ever validated it.
You can read the function directly using
source("https://raw.githubusercontent.com/nutterb/StudyPlanning/master/R/sim_wilcoxon.R")
or install the package (I haven't been actively developing it for a couple years) using:
devtools::install_github("nutterb/StudyPlanning")
In order to make it work, you'll need to estimate distributions from each of the groups in your sample. In the example below, I've assumed one group follows a Poisson distribution with a mean of 2.1, and the other follows a Poisson distribution with a mean of 3.53. I've also assumed equal sample sizes. This yields an estimate power of 0.444.
set.seed(123)
sim_wilcoxon(n=22, # total sample size
weights=list(c(1, 1)), # equal sample size per group
rpois(lambda=2.1), # distribution of first sample
rpois(lambda=3.53), # distribution of second sample
nsim=1000)
n_total n1 n2 k alpha power nsim pop1_param pop2_param pop1_dist pop2_dist
1 22 11 11 0.5 0.05 0.444 1000 lambda=2.1 lambda=3.53 rpois rpois
Given my comments under your post above:
It sounds to be like you are analyzing a 2 x 2 contingency table: Group A vs. Group B x Success vs. Failure. With these, you can easily calculate an odds ratio (OR), see metafor::escalc()
for good documentation on getting an OR from a 2 x 2 contingency table.
I have used epiR::epi.ccsize()
to do power analyses for odds ratios before in working with epidemiologists. It is geared toward epidemiologists, but the statistics are the same, and the code is very simple.
Let's say we are expecting an odds ratio of 1.5, where there is a 30% success rate in the control group and there is a 2:1 ratio of participants in the control versus experimental group (i.e., what you describe in your post), and we want 95% power:
epi.ccsize(OR=1.50, p0=.30, n=NA, power=.95, r=2)
Which gives us a list:
$n.total
[1] 1578
$n.case
[1] 526
$n.control
[1] 1052
Translating from epidemiologist-centric language, you need 526 experimental and 1052 controls to get 95% power in that situation.
It might also be tempting to try stats::power.prop.test()
, but I'm not sure how to handle your 2:1 ratio using that function. For example, this response says that you just need to make sure your smallest group hits the threshold given by power.prop.test()
, but I find that that estimate is unnecessarily high:
power.prop.test(p1=.30, p2=.391304, power=.95) # these values for p1 and p2 give OR of 1.50
Two-sample comparison of proportions power calculation
n = 702.1545
p1 = 0.3
p2 = 0.391304
sig.level = 0.05
power = 0.95
alternative = two.sided
NOTE: n is number in *each* group
This overestimate jibes well with the comment to the post I linked above, where user Underminer says:
"If you do a 95/5 split, then it'll just take longer to hit the
minimum sample size for the variation that is getting the 5%." - while
this is a conservative approach to at least satisfying the specified
power of the test, you will in actuality be exceeding the specified
power entered in power.prop.test if you have one "small" and on
"large" group (e.g. n1 = 19746, n2 = 375174). A more exact method of
meeting power requirements for unequal sample sizes would likely be
desirable
Here's a relevant RPubs link using the pwr
package, discussing unequal sample sizes. However, I find the most intuitive way to do this being the way using epiR
.
Best Answer
From the fragmentary and undocumented R code you show, I suppose you want to do a two-sided, one-sample t test at level $\alpha = 0.05$ based on a sample from a normal population with standard deviation $\sigma=1.91$ and hope for power $0.80$ to detect a difference in population means of $1.$
Several methods are in common use, and they may give slightly different answers.
Find sample size necessary to get power 80% using a comparable z-test. When the required $n$ is 30 or larger, the result will be approximately correct.
Use an exact formula for the power of such a t test, based on a non-central t distribution. Many intermediate level applied statistics texts and mathematical statistics texts show the formula, and software such as R will do the necessary computation for the noncentral t distribution.
Many statistical computer programs have 'power and sample' size procedures; most use the noncentral t distribution.
Simulation of many t tests for normal data of a trial sample size $n$ from a population with appropriate $\mu$ and $\sigma$ to find the proportion that reject (approximate power).
You have already seen computer output from R. Below is output from a recent release of Minitab statistical software. It gives $n = 31$ as the desired sample size--in agreement with your result from R.
Finally, here is a simulation in R, showing that (in appropriate circumstances) $n = 31$ gives power about 80%. [I use a 'for' loop because it seems to be more widely understood than more elegant structures in R. With $m = 10\,000$ iterations one can expect about two decimal places of accuracy.]
Note: If discrepancies among the various formulas and computational methods you used are small, that may be due to rounding errors or approximations. If discrepancies are large, you need to verify you have correct formulas and are using correct syntax in programs.