Solved – Why are results different when using aov_ez{afex} and Anova{car}, Type III SS in R

anovacategorical-encodingr

Hopefully, the answer to this question is simple.

Why do I get different results when I am using Anova from the car package and the aov_ez from the afex package?

library(car)
library(afex)

set.seed(100)

dv <-  runif(30,90,110)
factor_a <-  factor(rep(c("a","b"),15))
factor_b <-  factor(rep(c("f","m"),each=15))
id <- factor(1:30)
data <- data.frame(dv,factor_a,factor_b,id)

data$dv[data$factor_b=="f" & data$factor_a=="a"] <- data$dv[data$factor_b=="f" & data$factor_a=="a"] *1.05

aov_ez(dv="dv",between = c("factor_a","factor_b"), id="id",data=data )
Anova(lm(dv~factor_a*factor_b, data=data),type = "III")

Results are different in case of the main effect factor B:
Using aov_ez, the main effect is significant, using Anova, the main effect is not significant. How is that possible?
In both cases, there is a significant interaction and the F-values are very similar. But the F-values of the main effects differ.

>     aov_ez(dv="dv",between = c("factor_a","factor_b"), id="id",data=data )
Contrasts set to contr.sum for the following variables: factor_a, factor_b
Anova Table (Type 3 tests)

Response: dv
             Effect    df   MSE       F ges p.value
1          factor_a 1, 26 20.96  4.16 + .14     .05
2          factor_b 1, 26 20.96    0.55 .02     .46
3 factor_a:factor_b 1, 26 20.96 9.03 ** .26    .006
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1

>     Anova(lm(dv~factor_a*factor_b, data=data),type = "III")
Anova Table (Type III tests)

Response: dv
                  Sum Sq Df   F value    Pr(>F)    
(Intercept)        89775  1 4282.2927 < 2.2e-16 ***
factor_a             267  1   12.7159  0.001434 ** 
factor_b             147  1    7.0288  0.013478 *  
factor_a:factor_b    189  1    9.0269  0.005822 ** 
Residuals            545 26                        
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Because I am quite new using R, which of these two packages I should use. Does anyone know why the results differ?

Best Answer

First note that you have a balanced between-subjects design with only dichotomous IVs, so the sum-of-squares type doesn't matter and Anova(lm(dv~factor_a*factor_b, data=data)) produces the same $p$-values as $t$-tests of regression coefficients:

> print(summary(lm(dv~factor_a*factor_b, data=data)))
[…]
Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          105.933      1.619  65.439  < 2e-16 ***
factor_ab             -8.450      2.370  -3.566  0.00143 ** 
factor_bm             -6.282      2.370  -2.651  0.01348 *  
factor_ab:factor_bm   10.069      3.351   3.004  0.00582 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[…]

So why does aov_ez give different $p$s? It seems that aov_ez codes the IVs with effects coding by default, rather than dummy coding as in base R's lm; the message Contrasts set to contr.sum for the following variables: factor_a, factor_b is a warning of this. When we set check.contrasts to FALSE, aov_ez uses dummy coding and we get concordant $F$s and $p$s:

> print(aov_ez(dv="dv",between = c("factor_a","factor_b"), id="id",data=data, check_contrasts = F ))
Anova Table (Type 3 tests)

Response: dv
             Effect    df   MSE        F ges p.value
1          factor_a 1, 26 20.96 12.72 ** .33    .001
2          factor_b 1, 26 20.96   7.03 * .21     .01
3 factor_a:factor_b 1, 26 20.96  9.03 ** .26    .006
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
Warning message:
Calculating Type 3 sums with contrasts != 'contr.sum' for: factor_a, factor_b, factor_a, factor_b
  Results likely bogus or not interpretable!
  You probably want check_contrasts = TRUE or options(contrasts=c('contr.sum','contr.poly'))

Why exactly the package author thinks that dummy coding in this situation makes the results "likely bogus or not interpretable" goes further into the weeds of ANOVA than I'm familiar with. But at least now we know what aov_ez is doing.

Related Solutions

Solved – How to specify specific contrasts for repeated measures ANOVA using car

This method is generally considered "old-fashioned" so while it may be possible, the syntax is difficult and I suspect fewer people know how to manipulate the anova commands to get what you want. The more common method is using glht with a likelihood-based model from nlme or lme4. (I'm certainly welcome to be proved wrong by other answers though.)

That said, if I needed to do this, I wouldn't bother with the anova commands; I'd just fit the equivalent model using lm, pick out the right error term for this contrast, and compute the F test myself (or equivalently, t test since there's only 1 df). This requires everything to be balanced and have sphericity, but if you don't have that, you should probably be using a likelihood-based model anyway. You might be able to somewhat correct for non-sphericity using the Greenhouse-Geiser or Huynh-Feldt corrections which (I believe) use the same F statistic but modify the df of the error term.

If you really want to use car, you might find the heplot vignettes helpful; they describe how the matrices in the car package are defined.

Using caracal's method (for the contrasts 1&2 - 3 and 1&2 - 4&5), I get

      psiHat      tStat          F         pVal
1 -3.0208333 -7.2204644 52.1351067 2.202677e-09
2 -0.2083333 -0.6098777  0.3719508 5.445988e-01

This is how I'd get those same p-values:

Reshape the data into long format and run lm to get all the SS terms.

library(reshape2)
d <- OBrienKaiser
d$id <- factor(1:nrow(d))
dd <- melt(d, id.vars=c(18,1:2), measure.vars=3:17)
dd$hour <- factor(as.numeric(gsub("[a-z.]*","",dd$variable)))
dd$phase <- factor(gsub("[0-9.]*","", dd$variable), 
                   levels=c("pre","post","fup"))
m <- lm(value ~ treatment*hour*phase + treatment*hour*phase*id, data=dd)
anova(m)

Make an alternate contrast matrix for the hour term.

foo <- matrix(0, nrow=nrow(dd), ncol=4)
foo[dd$hour %in% c(1,2) ,1] <- 0.5
foo[dd$hour %in% c(3) ,1] <- -1
foo[dd$hour %in% c(1,2) ,2] <- 0.5
foo[dd$hour %in% c(4,5) ,2] <- -0.5
foo[dd$hour %in% 1 ,3] <- 1
foo[dd$hour %in% 2 ,3] <- 0
foo[dd$hour %in% 4 ,4] <- 1
foo[dd$hour %in% 5 ,4] <- 0

Check that my contrasts give the same SS as the default contrasts (and the same as from the full model).

anova(lm(value ~ hour, data=dd))
anova(lm(value ~ foo, data=dd))

Get the SS and df for just the two contrasts I want.

anova(lm(value ~ foo[,1], data=dd))
anova(lm(value ~ foo[,2], data=dd))

Get the p-values.

> F <- 73.003/(72.81/52)
> pf(F, 1, 52, lower=FALSE)
[1] 2.201148e-09
> F <- .5208/(72.81/52)
> pf(F, 1, 52, lower=FALSE)
[1] 0.5445999

Optionally adjust for sphericity.

pf(F, 1*.48867, 52*.48867, lower=FALSE)
pf(F, 1*.57413, 52*.57413, lower=FALSE)

ANOVA and MANOVA – How to Interpret Type I, Type II, and Type III ANOVA and MANOVA?

What you are calling type II SS, I would call type III SS. Lets imagine that there are just two factors A and B (and we'll throw in the A*B interaction later to distinguish type II SS). Further, lets imagine that there are different $n$s in the four cells (e.g., $n_{11}$=11, $n_{12}$=9, $n_{21}$=9, and $n_{22}$=11). Now your two factors are correlated with each other. (Try this yourself, make 2 columns of 1's and 0's and correlate them, $r=.1$; n.b. it doesn't matter if $r$ is 'significant', this is the whole population that you care about). The problem with your factors being correlated is that there are sums of squares that are associated with both A and B. When computing an ANOVA (or any other linear regression), we want to partition the sums of squares. A partition puts all sums of squares into one and only one of several subsets. (For example, we might want to divide the SS up into A, B and error.) However, since your factors (still only A and B here) are not orthogonal there is no unique partition of these SS. In fact, there can be very many partitions, and if you are willing to slice your SS up into fractions (e.g., "I'll put .5 into this bin and .5 into that one"), there are infinite partitions. A way to visualize this is to imagine the MasterCard symbol: The rectangle represents the total SS, and each of the circles represents the SS that are attributable to that factor, but notice the overlap between the circles in the center, those SS could be given to either circle.

enter image description here

The question is: How are we to choose the 'right' partition out of all of these possibilities? Let's bring the interaction back in and discuss some possibilities:

Type I SS:

SS(A)
SS(B|A)
SS(A*B|A,B)

Type II SS:

SS(A|B)
SS(B|A)
SS(A*B|A,B)

Type III SS:

SS(A|B,A*B)
SS(B|A,A*B)
SS(A*B|A,B)

Notice how these different possibilities work. Only type I SS actually uses those SS in the overlapping portion between the circles in the MasterCard symbol. That is, the SS that could be attributed to either A or B, are actually attributed to one of them when you use type I SS (specifically, the one you entered into the model first). In both of the other approaches, the overlapping SS are not used at all. Thus, type I SS gives to A all the SS attributable to A (including those that could also have been attributed elsewhere), then gives to B all of the remaining SS that are attributable to B, then gives to the A*B interaction all of the remaining SS that are attributable to A*B, and leaves the left-overs that couldn't be attributed to anything to the error term.

Type III SS only gives A those SS that are uniquely attributable to A, likewise it only gives to B and the interaction those SS that are uniquely attributable to them. The error term only gets those SS that couldn't be attributed to any of the factors. Thus, those 'ambiguous' SS that could be attributed to 2 or more possibilities are not used. If you sum the type III SS in an ANOVA table, you will notice that they do not equal the total SS. In other words, this analysis must be wrong, but errs in a kind of epistemically conservative way. Many statisticians find this approach egregious, however government funding agencies (I believe the FDA) requires their use.

The type II approach is intended to capture what might be worthwhile about the idea behind type III, but mitigate against its excesses. Specifically, it only adjusts the SS for A and B for each other, not the interaction. However, in practice type II SS is essentially never used. You would need to know about all of this and be savvy enough with your software to get these estimates, and the analysts who are typically think this is bunk.

There are more types of SS (I believe IV and V). They were suggested in the late 60's to deal with certain situations, but it was later shown that they do not do what was thought. Thus, at this point they are just a historical footnote.

As for what questions these are answering, you basically have that right already in your question:

Estimates using type I SS tell you how much of the variability in Y can be explained by A, how much of the residual variability can be explained by B, how much of the remaining residual variability can be explained by the interaction, and so on, in order.
Estimates based on type III SS tell you how much of the residual variability in Y can be accounted for by A after having accounted for everything else, and how much of the residual variability in Y can be accounted for by B after having accounted for everything else as well, and so on. (Note that both go both first and last simultaneously; if this makes sense to you, and accurately reflects your research question, then use type III SS.)