If you have already done the experiment then there is little point in doing any power analyses. Where the P-values are small the power for the observed effect size and variability was large enough. Where the P-values are large then the power was small for the observed effect size and variability. Power analysis is useful for planning experiments, but not useful after the fact. See this paper by Hoenig & Helsey: http://www.tandfonline.com/doi/abs/10.1198/000313001300339897#preview
Your desire for a power analysis appears to be based on this statement "one must be sure that the results are 'real' and not just due to the small sample size", and so it is useful to consider it closely. Firstly, statistical analysis cannot tell you about the reality of a result--something that you probably know, given that you put the 'real' in quotes. Second, you imply that a small sample is more likely to yield a false positive result, when the reality is that a small sample is exactly as likely to do that as a large sample. The small sample is more likely to yield a false negative result.
If you want to be confident that the results yield reliable conclusions then you have to consider their nature in light of what is known about the system and, ideally, replicate the parts of the study that are most interesting or surprising. (I acknowledge that a well-judged statistical analysis is more helpful here than a poorly judged one: see Julien Sturnemann's answer for some suggestions.)
I was not able to reproduce the results you got from WebPower using the pilot data you supplied. I was able to reproduce your R code however.
You are correct that you can't use the $\eta^2$ for Cohen's f, but $f^2 = \frac{\eta^2}{1-\eta^2}$
"However, how should I compute the effect size from the pilot study" - use the $\eta^2$ from the pilot study.
"Why are there interaction effect sizes, i.e, the effect size for group x vs group y?" Those are the effect sizes for the pair-wise comparisons (if you were using a t-test or a TukeyHSD)
require(dplyr)
require(reshape2)
pilot <- data.frame(option1 = c(6.3, 2.8, 7.8, 7.9, 4.9),
option2 = c(9.9, 4.1, 3.9, 6.3, 6.9),
option3 = c(5.1, 2.9, 3.6, 5.7, 4.5),
option4 = c(1.0, 2.8, 4.8, 3.9, 1.6))
pilot2 <- pilot %>%
reshape2::melt(value.name = "y") %>%
dplyr::rename("option" = "variable")
lm1 <- lm(y ~ option, data = pilot2)
aov1 <- aov(lm1)
means <- apply(pilot, 2, mean)
vs <- apply(pilot, 2, var)
# cohen's f for overall anova
# eta^2 = SSR / SST
eta.sq <- anova(lm1)$`Sum Sq`[2] / sum(anova(lm1)$`Sum Sq`)
f <- sqrt(eta.sq / (1-eta.sq))
# cohen's d for pairwise
d <- abs(means[c(1,1,1,2,2,3)] - means[c(2,3,4,3,4,4)]) / sqrt(((5-1)*vs[c(1,1,1,2,2,3)] + (5-1)*vs[c(2,3,4,3,4,4)])/ (5+5))
names(d) <- c("1-2", "1-3", "1-4", "2-3", "2-4", "3-4")
require(pwr)
# with 5 samples, we have the power to detect effect size f = 0.835
# i.e. with only 5 samples, we need a large effect to detect
pwr::pwr.anova.test(k = 4, n = 5, sig.level = 0.05, power = 0.80)
#>
#> Balanced one-way analysis of variance power calculation
#>
#> k = 4
#> n = 5
#> f = 0.8352722
#> sig.level = 0.05
#> power = 0.8
#>
#> NOTE: n is number in each group
# since we have a really large effect in the pilot for f = 1.2,
# we only need 3 per group to detect with 80% power
pwr::pwr.anova.test(k = 4, f = 1.2414, sig.level = 0.05, power = 0.80)
#>
#> Balanced one-way analysis of variance power calculation
#>
#> k = 4
#> n = 2.950833
#> f = 1.2414
#> sig.level = 0.05
#> power = 0.8
#>
#> NOTE: n is number in each group
Best Answer
Given my comments under your post above:
It sounds to be like you are analyzing a 2 x 2 contingency table: Group A vs. Group B x Success vs. Failure. With these, you can easily calculate an odds ratio (OR), see
metafor::escalc()
for good documentation on getting an OR from a 2 x 2 contingency table.I have used
epiR::epi.ccsize()
to do power analyses for odds ratios before in working with epidemiologists. It is geared toward epidemiologists, but the statistics are the same, and the code is very simple.Let's say we are expecting an odds ratio of 1.5, where there is a 30% success rate in the control group and there is a 2:1 ratio of participants in the control versus experimental group (i.e., what you describe in your post), and we want 95% power:
Which gives us a list:
Translating from epidemiologist-centric language, you need 526 experimental and 1052 controls to get 95% power in that situation.
It might also be tempting to try
stats::power.prop.test()
, but I'm not sure how to handle your 2:1 ratio using that function. For example, this response says that you just need to make sure your smallest group hits the threshold given bypower.prop.test()
, but I find that that estimate is unnecessarily high:This overestimate jibes well with the comment to the post I linked above, where user Underminer says:
Here's a relevant RPubs link using the
pwr
package, discussing unequal sample sizes. However, I find the most intuitive way to do this being the way usingepiR
.