Solved – Using and interpreting type I SOS in three-way ANOVA

anovaregressionsums-of-squares

My research question is from education: To test whether groups (university, department, gender) differ based on their scores in a test, and whether there are any interaction effects, I'll do 3-way ANOVA? Data is unbalances (unequal cells).

My professor says I have to do SOS Type 1 (report parameter estimates) because of the unbalanced design. His rationale is to compare the effect of each main effect adjusted for the effects of the other? He also said he'd prefer gender to be the first, he'd want parameter estimates, and earlier he had said he prefered Cohen's d type effect sizes.

My questions:

1) I put gender,university,department, than all 2way interactions, than 3-way interaction in the model. I found gender was not SS, and no SS interaction of it, what-so-ever. Fgender (1,b) = 1.83, p = .18, η2 = .006. I reported R-squared. Should I have proceeded with all other combinations? There are a lot, right?

I dropped gender and redid the analysis with (2×2) [university, department, uni*dep].

2a) I reported Funiversity (1,b) = 46.45, p < .01, η2 = .16. I looked at the estimated means, reported them, and calculated Cohen's d -0.8 ( the difference between group means divided by the square root of the mean square error). I also reported Unstandardized Buniversity = 0.48, t = 5.67, p < .01. and concluded university A were more likely to get a higher score than university B. I used an online effect size conversion calculator from eta-squared to cohen's d. It also η2 = .16 = d = 0.8.

2b) I redid the analysis with [department, university, uni*dep). I reported the exact stuff above for department. But the η2 = .13 and when I calculated Cohen's d it turned out to be 0.38. I used an online effect size conversion calculator from eta-squared to cohen's d, but it reported 0.70. What is wrong?

2c) I plotted marginal means for uni*dep. I put it there to show there is interaction.

2d) I came to the interaction. I knew it would not change in Type I SOS, so I reported Funiversity by department (1, b) = 11.80, p < .01, η2 = .04.

So I said first two are practically the last is SS.

A) My question is to be sure to go to my prof unembarrassed: Is this okay? What else can I do for the interaction? Can I not say uniBdepB < all others, which is obvious in the graph?

B) If parameter estimates tell me the slope, how come they are not equal to estimated means?

C) How can I interpret the parameter estimates for the interaction effect? Which has a value for uniAdepA but than it says the rest are redundant?

Best Answer

There are a number of issues here. I will say several things, not in any particular order.

I agree with your professor, you should use type I SS. To say that type I SS shouldn't be used when groups are unbalanced doesn't make sense; if the factors are orthogonal, type I, II, and III SS are identical--this is just arguing that type I SS should never be used. I have discussed the meaning of and argument for using type I SS in the answers to other questions on CV (mostly here, but also here). In brief, when cells are unbalanced, the factors are correlated, and there are sums of squares that could be attributed to more than one factor. Using type II or III SS ignores those overlapping sums of squares, thus they must be wrong.

Type I SS are used to conduct hypothesis tests. The analyst decides what order to enter the terms into the model. Deciding on an order amounts to deciding which factors will get which overlapping sums of squares. Algorithmically, the sums of squares used for each factor equals the reduction in the error term when that factor is added to the model. For instance, imagine a 'reduced' model with $k$ factors, an 'augmented' model with $k+1$ factors is fit, then:
$$SS(k+1)=SSE_{k}-SSE_{k+1}$$ To form the proper F ratio to test these 'extra sums of squares', first divide them by their corresponding df to compute the mean squares for that factor, then they are divided by the MSE from the full model (i.e., with all the factors entered). Done in this manner, if you sum over all the sums of squares they will equal the total SS. This same procedure also affords 'simultaneous' tests, in which several factors are entered together and the F test checks if all are nulls. (For example, adding all the 2-way interactions together would allow you to simultaneously check if all the corresponding $\beta$'s=0.)

The parameter estimates (i.e., $\beta$'s) to report are those from the full model. Parameter estimates from reduced models are likely to be biased since your factors are correlated.

If all of this seems hard to follow, don't feel bad. It is confusing. Some time ago, the US Food and Drug Administration required that all research it funded use type III SS. The reasoning was that it was better for people to use a method that was wrong but that they could understand, than one that was right that they couldn't. While one could take issue with this line of reasoning, the point is that doing this correctly is not terribly simple.

The meaning of your parameter estimates depends on how the factors were coded. Perhaps the most typical scheme is 'reference cell coding' (commonly called 'dummy coding'). In this case, one cell (usually a control group) is designated as the reference cell and the other levels of that factor are encoded in $l-1$ new variables (i.e., columns). For instance, if you have 3 groups (a control and 2 treatments), you add 2 new columns to your data set. In the first column (treatment1), each observation gets a 1 if it is in the first treatment group, and a 0 otherwise. In the second column (treatment2), you would put a 1 for those in the second treatment group or else a 0. If you use this approach (of course, actually you don't do this, but if the software does it behind the scenes), then the intercept is equal to the mean of your control group, and the parameter estimate for each level is the difference between the mean of that level and the mean of the control group. There are many, many different coding schemes and the meanings of the parameter estimates will differ for each; check out the link above for more information.

You don't interpret interactions. If you believe that an interaction is real, you don't interpret the main effects of the factors that go into that interaction, either. Instead, you interpret 'simple effects'. That is, you look at the effect of one factor on the dependent variable several times (specifically, at each level of the other factor). Which factor you choose to hold constant, and which factor you examine directly is entirely up to you; pick whatever seems most intuitive. Often the best way to understand your data when you have interactions, is to make graphs. For instance, if you had a 2x2 design with a 'significant' interaction, you could make a barplot of group A1 & A2, for the first level of factor B, and another barplot of A1 & A2, for the second level of factor B. In this manner, you are looking at the effect of factor A at each level of the other factor.

Related Solutions

Solved – Can you convert three-way ANOVA to one-way ANOVA

No, there is nothing wrong with doing this. This is sometimes called the 'flat' approach to factorial ANOVA (although I don't know how common that phrasing is). It is sometimes used when there are problems with your data, such as combinations of some levels in which there are no observations. As @Schortchi notes, you should get the same overall $F$-value / test for both models.

Solved – Effect size calculation for One-way ANOVA and Tukey-HSD

I was not able to reproduce the results you got from WebPower using the pilot data you supplied. I was able to reproduce your R code however.

You are correct that you can't use the $\eta^2$ for Cohen's f, but $f^2 = \frac{\eta^2}{1-\eta^2}$

"However, how should I compute the effect size from the pilot study" - use the $\eta^2$ from the pilot study.
"Why are there interaction effect sizes, i.e, the effect size for group x vs group y?" Those are the effect sizes for the pair-wise comparisons (if you were using a t-test or a TukeyHSD)

require(dplyr)
require(reshape2)

pilot <- data.frame(option1 = c(6.3, 2.8, 7.8, 7.9, 4.9),
                    option2 = c(9.9, 4.1, 3.9, 6.3, 6.9),
                    option3 = c(5.1, 2.9, 3.6, 5.7, 4.5),
                    option4 = c(1.0, 2.8, 4.8, 3.9, 1.6))
pilot2 <- pilot %>% 
  reshape2::melt(value.name = "y") %>%
  dplyr::rename("option" = "variable")

lm1 <- lm(y ~ option, data = pilot2)
aov1 <- aov(lm1)

means <- apply(pilot, 2, mean)
vs <- apply(pilot, 2, var)

# cohen's f for overall anova
# eta^2 = SSR / SST
eta.sq <- anova(lm1)$`Sum Sq`[2] / sum(anova(lm1)$`Sum Sq`)
f <- sqrt(eta.sq / (1-eta.sq))

# cohen's d for pairwise
d <- abs(means[c(1,1,1,2,2,3)] - means[c(2,3,4,3,4,4)]) / sqrt(((5-1)*vs[c(1,1,1,2,2,3)] + (5-1)*vs[c(2,3,4,3,4,4)])/ (5+5))
names(d) <- c("1-2", "1-3", "1-4", "2-3", "2-4", "3-4")

require(pwr)

# with 5 samples, we have the power to detect effect size f = 0.835
#  i.e. with only 5 samples, we need a large effect to detect

pwr::pwr.anova.test(k = 4, n = 5, sig.level = 0.05, power = 0.80)
#> 
#>      Balanced one-way analysis of variance power calculation 
#> 
#>               k = 4
#>               n = 5
#>               f = 0.8352722
#>       sig.level = 0.05
#>           power = 0.8
#> 
#> NOTE: n is number in each group

# since we have a really large effect in the pilot for f = 1.2,
#   we only need 3 per group to detect with 80% power

pwr::pwr.anova.test(k = 4, f = 1.2414, sig.level = 0.05, power = 0.80)
#> 
#>      Balanced one-way analysis of variance power calculation 
#> 
#>               k = 4
#>               n = 2.950833
#>               f = 1.2414
#>       sig.level = 0.05
#>           power = 0.8
#> 
#> NOTE: n is number in each group