Solved – Effect size calculation for One-way ANOVA and Tukey-HSD

anovarsample-sizestatistical-powertukey-hsd-test

I'm calculating the minimum sample size to conduct a One-way ANOVA test that I will follow with a Tukey-HSD post hoc analysis. I'm using software R to do my calculations.

From the 'pwr' package, I need to give as input the number of groups (k), the effect size (f) the significance level, and the power to get the minimum sample size. However, to get the effect size, one needs a pilot study or a guess. From a small pilot study let's say I got the following trial data:

However, how should I compute the effect size from the pilot study? I've seen several "effect sizes": Cohen d, Eta-Squared, Partial Eta Squared, Omega Squared, Epsilon Squared, etc..
I also read in the following book: Practical Statistical Power Analysis using WebPower and R, that they have an online calculator: https://webpower.psychstat.org/models/means03/effectsize.php which gave me the following:

It is also said in the book "Cohen defined the size of effect as: small 0.1, medium 0.25, and large
0.4", and here I got >1, which is really strange. I don't understand why are there so many different effect sizes as output and what is being used for the "overall effect size". Why are there interaction effect sizes, i.e, the effect size for group $x$ vs group $y$? I thought we only analyze the difference between each pair of groups in a Tukey-HSD test.

For example, if the effect size was f=1.0096, I would get the following:

> pwr.anova.test(k =4 , f =1.0096 , sig.level =0.05 , power =0.80)

 Balanced one-way analysis of variance power calculation 

          k = 4
          n = 3.806971
          f = 1.0096
  sig.level = 0.05
      power = 0.8

  NOTE: n is number in each group

This would give around 4 for each group, i.e, 4*4=16 in total. The sample is so small because the effect size of 1.0096 is really big. This leads me to another question: is the effect size of 1.0096 correct?

And for a two-way ANOVA, is the effect size "formula" the same?

EDIT: I calculated the eta-square effect size in R and got a value of 0.39, which is a lot smaller than the previous one (1.0096). This would give a minimum sample size of 19 per group, i.e, 76 in total. This sample size sounds more correct to me. Should I use eta square and why? However, can I even use the 0.39 value in the "f" variable input for the pwr.anova.test? I think the "f" argument there is only for cohen's f and not for eta square.

> outcome <- c(6.3,2.8,7.8,7.9,4.9,9.9,4.1,3.9,6.3,6.9,5.1,2.9,3.6,5.7,4.5,1.0,2.8,4.8,3.9,1.6)  
# data
> treatment1 <- factor( c( 1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4 ))        
# grouping variable
> anova1 <- aov( outcome ~ treatment1 )                # run the ANOVA
> summary( anova1 )                                    # print the ANOVA 
table
            Df Sum Sq Mean Sq F value Pr(>F)  
treatment1   3  37.13  12.375    3.46 0.0414 *
Residuals   16  57.22   3.576                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> etaSquared( anova1 ) 
             eta.sq eta.sq.part
 treatment1 0.3935058   0.3935058

Best Answer

I was not able to reproduce the results you got from WebPower using the pilot data you supplied. I was able to reproduce your R code however.

You are correct that you can't use the $\eta^2$ for Cohen's f, but $f^2 = \frac{\eta^2}{1-\eta^2}$

"However, how should I compute the effect size from the pilot study" - use the $\eta^2$ from the pilot study.
"Why are there interaction effect sizes, i.e, the effect size for group x vs group y?" Those are the effect sizes for the pair-wise comparisons (if you were using a t-test or a TukeyHSD)

require(dplyr)
require(reshape2)

pilot <- data.frame(option1 = c(6.3, 2.8, 7.8, 7.9, 4.9),
                    option2 = c(9.9, 4.1, 3.9, 6.3, 6.9),
                    option3 = c(5.1, 2.9, 3.6, 5.7, 4.5),
                    option4 = c(1.0, 2.8, 4.8, 3.9, 1.6))
pilot2 <- pilot %>% 
  reshape2::melt(value.name = "y") %>%
  dplyr::rename("option" = "variable")

lm1 <- lm(y ~ option, data = pilot2)
aov1 <- aov(lm1)

means <- apply(pilot, 2, mean)
vs <- apply(pilot, 2, var)

# cohen's f for overall anova
# eta^2 = SSR / SST
eta.sq <- anova(lm1)$`Sum Sq`[2] / sum(anova(lm1)$`Sum Sq`)
f <- sqrt(eta.sq / (1-eta.sq))

# cohen's d for pairwise
d <- abs(means[c(1,1,1,2,2,3)] - means[c(2,3,4,3,4,4)]) / sqrt(((5-1)*vs[c(1,1,1,2,2,3)] + (5-1)*vs[c(2,3,4,3,4,4)])/ (5+5))
names(d) <- c("1-2", "1-3", "1-4", "2-3", "2-4", "3-4")

require(pwr)

# with 5 samples, we have the power to detect effect size f = 0.835
#  i.e. with only 5 samples, we need a large effect to detect

pwr::pwr.anova.test(k = 4, n = 5, sig.level = 0.05, power = 0.80)
#> 
#>      Balanced one-way analysis of variance power calculation 
#> 
#>               k = 4
#>               n = 5
#>               f = 0.8352722
#>       sig.level = 0.05
#>           power = 0.8
#> 
#> NOTE: n is number in each group

# since we have a really large effect in the pilot for f = 1.2,
#   we only need 3 per group to detect with 80% power

pwr::pwr.anova.test(k = 4, f = 1.2414, sig.level = 0.05, power = 0.80)
#> 
#>      Balanced one-way analysis of variance power calculation 
#> 
#>               k = 4
#>               n = 2.950833
#>               f = 1.2414
#>       sig.level = 0.05
#>           power = 0.8
#> 
#> NOTE: n is number in each group

Related Solutions

Solved – Effect size calculations in mixed ANOVA design for power analysis

I am not familiar with Gpower3 so this does not exactly address your question, but here is another approach that you may want to consider.

For power and sample size studies more complicated than simple 2 sample tests I prefer to use simulation rather than prepackaged routines. The basic steps are:

Think through what your data will look like (structure, means, variances, distributions, ...)
Think through how you will analyse the data
Generate data based on current assumptions and analyze it
Repeate step 3 many times and compute the percentage of times that the null is rejected (this is the power)
Repeate steps 3 and 4 for different sets of assumptions

One of the advantages of this approach is that you know exactly what assumptions are being made and what the nature of the data and analysis is. The main disadvantage is that it takes a bit more work and time than just plugging in a few numbers to a canned routine.

Solved – One-way ANOVA, clustering levels using Tukey Kramer HSD

Are you just trying to display the results as a compact letter display? Where, means sharing a letter are not significantly different.

If so, this can be accomplished with the multcompLetters function in the multcompView package.

But you may want to use the emmeans package for multiple comparisons.

Note that Tukey's test controls for the familywise error rate, so further corrections to the p-values are not needed.

if(!require(multcompView)){install.packages("multcompView")}
if(!require(emmeans)){install.packages("emmeans")}

fm1 = aov(breaks ~ wool + tension, data = warpbreaks)

TH = TukeyHSD(fm1, "tension", ordered = TRUE)

TH

P.value = TH$tension[,4]

library(multcompView)

multcompLetters(P.value)

###   M   L   H 
### "a" "b" "a" 

library(emmeans)

marginal = emmeans(fm1, ~ tension)

cld(marginal, Letters=letters)


###  tension   emmean       SE df lower.CL upper.CL .group
###  H       21.66667 2.738184 50 16.16686 27.16647  a    
###  M       26.38889 2.738184 50 20.88908 31.88869  a    
###  L       36.38889 2.738184 50 30.88908 41.88869   b   
###
### Results are averaged over the levels of: wool 
### Confidence level used: 0.95 
### P value adjustment: tukey method for comparing a family of 3 estimates 
### significance level used: alpha = 0.05

Best Answer

Related Solutions

Solved – Effect size calculations in mixed ANOVA design for power analysis

Solved – One-way ANOVA, clustering levels using Tukey Kramer HSD

Related Question