ANOVA and MANOVA – How to Interpret Type I, Type II, and Type III ANOVA and MANOVA?

anovahypothesis testingmanovarsums-of-squares

My primary question is how to interpret the output (coefficients, F, P) when conducting a Type I (sequential) ANOVA?

My specific research problem is a bit more complex, so I will break my example into parts. First, if I am interested in the effect of spider density (X1) on say plant growth (Y1) and I planted seedlings in enclosures and manipulated spider density, then I can analyze the data with a simple ANOVA or linear regression. Then it wouldn't matter if I used Type I, II, or III Sum of Squares (SS) for my ANOVA. In my case, I have 4 replicates of 5 density levels, so I can use density as a factor or as a continuous variable. In this case, I prefer to interpret it as a continuous independent (predictor) variable. In R I might run the following:

lm1 <- lm(y1 ~ density, data = Ena)
summary(lm1)
anova(lm1)

Running the anova function will make sense for comparison later hopefully, so please ignore the oddness of it here. The output is:

Response: y1
          Df  Sum Sq Mean Sq F value  Pr(>F)  
density    1 0.48357 0.48357  3.4279 0.08058 .
Residuals 18 2.53920 0.14107

Now, let's say I suspect that the starting level of inorganic nitrogen in the soil, which I couldn't control, may have also significantly affected the plant growth. I'm not particularly interested in this effect but would like to potentially account for the variation it causes. Really, my primary interest is in the effects of spider density (hypothesis: increased spider density causes increased plant growth – presumably through reduction of herbivorous insects but I'm only testing the effect not the mechanism). I could add the effect of inorganic N to my analysis.

For the sake of my question, let's pretend that I test the interaction density*inorganicN and it's non-significant so I remove it from the analysis and run the following main effects:

> lm2 <- lm(y1 ~ density + inorganicN, data = Ena)
> anova(lm2)
Analysis of Variance Table

Response: y1
           Df  Sum Sq Mean Sq F value  Pr(>F)  
density     1 0.48357 0.48357  3.4113 0.08223 .
inorganicN  1 0.12936 0.12936  0.9126 0.35282  
Residuals  17 2.40983 0.14175

Now, it makes a difference whether I use Type I or Type II SS (I know some people object to the terms Type I & II etc. but given the popularity of SAS it's easy short-hand). R anova{stats} uses Type I by default. I can calculate the type II SS, F, and P for density by reversing the order of my main effects or I can use Dr. John Fox's "car" package (companion to applied regression). I prefer the latter method since it is easier for more complex problems.

library(car)
Anova(lm2)
            Sum Sq Df F value  Pr(>F)  
density    0.58425  1  4.1216 0.05829 .
inorganicN 0.12936  1  0.9126 0.35282  
Residuals  2.40983 17

My understanding is that type II hypotheses would be, "There is no linear effect of x1 on y1 given the effect of (holding constant?) x2" and the same for x2 given x1. I guess this is where I get confused. What is the hypothesis being tested by the ANOVA using the type I (sequential) method above compared to the hypothesis using the type II method?

In reality, my data is a bit more complex because I measured numerous metrics of plant growth as well as nutrient dynamics and litter decomposition. My actual analysis is something like:

Y <- cbind(y1 + y2 + y3 + y4 + y5)
# Type II
mlm1 <- lm(Y ~ density + nitrate + Npred, data = Ena)
Manova(mlm1)

Type II MANOVA Tests: Pillai test statistic
        Df test stat approx F num Df den Df  Pr(>F)    
density  1   0.34397        1      5     12 0.34269    
nitrate  1   0.99994    40337      5     12 < 2e-16 ***
Npred    1   0.65582        5      5     12 0.01445 * 


# Type I
maov1 <- manova(Y ~ density + nitrate + Npred, data = Ena)
summary(maov1)

          Df  Pillai approx F num Df den Df  Pr(>F)    
density    1 0.99950     4762      5     12 < 2e-16 ***
nitrate    1 0.99995    46248      5     12 < 2e-16 ***
Npred      1 0.65582        5      5     12 0.01445 *  
Residuals 16

Best Answer

What you are calling type II SS, I would call type III SS. Lets imagine that there are just two factors A and B (and we'll throw in the A*B interaction later to distinguish type II SS). Further, lets imagine that there are different $n$s in the four cells (e.g., $n_{11}$=11, $n_{12}$=9, $n_{21}$=9, and $n_{22}$=11). Now your two factors are correlated with each other. (Try this yourself, make 2 columns of 1's and 0's and correlate them, $r=.1$; n.b. it doesn't matter if $r$ is 'significant', this is the whole population that you care about). The problem with your factors being correlated is that there are sums of squares that are associated with both A and B. When computing an ANOVA (or any other linear regression), we want to partition the sums of squares. A partition puts all sums of squares into one and only one of several subsets. (For example, we might want to divide the SS up into A, B and error.) However, since your factors (still only A and B here) are not orthogonal there is no unique partition of these SS. In fact, there can be very many partitions, and if you are willing to slice your SS up into fractions (e.g., "I'll put .5 into this bin and .5 into that one"), there are infinite partitions. A way to visualize this is to imagine the MasterCard symbol: The rectangle represents the total SS, and each of the circles represents the SS that are attributable to that factor, but notice the overlap between the circles in the center, those SS could be given to either circle.

enter image description here

The question is: How are we to choose the 'right' partition out of all of these possibilities? Let's bring the interaction back in and discuss some possibilities:

Type I SS:

SS(A)
SS(B|A)
SS(A*B|A,B)

Type II SS:

SS(A|B)
SS(B|A)
SS(A*B|A,B)

Type III SS:

SS(A|B,A*B)
SS(B|A,A*B)
SS(A*B|A,B)

Notice how these different possibilities work. Only type I SS actually uses those SS in the overlapping portion between the circles in the MasterCard symbol. That is, the SS that could be attributed to either A or B, are actually attributed to one of them when you use type I SS (specifically, the one you entered into the model first). In both of the other approaches, the overlapping SS are not used at all. Thus, type I SS gives to A all the SS attributable to A (including those that could also have been attributed elsewhere), then gives to B all of the remaining SS that are attributable to B, then gives to the A*B interaction all of the remaining SS that are attributable to A*B, and leaves the left-overs that couldn't be attributed to anything to the error term.

Type III SS only gives A those SS that are uniquely attributable to A, likewise it only gives to B and the interaction those SS that are uniquely attributable to them. The error term only gets those SS that couldn't be attributed to any of the factors. Thus, those 'ambiguous' SS that could be attributed to 2 or more possibilities are not used. If you sum the type III SS in an ANOVA table, you will notice that they do not equal the total SS. In other words, this analysis must be wrong, but errs in a kind of epistemically conservative way. Many statisticians find this approach egregious, however government funding agencies (I believe the FDA) requires their use.

The type II approach is intended to capture what might be worthwhile about the idea behind type III, but mitigate against its excesses. Specifically, it only adjusts the SS for A and B for each other, not the interaction. However, in practice type II SS is essentially never used. You would need to know about all of this and be savvy enough with your software to get these estimates, and the analysts who are typically think this is bunk.

There are more types of SS (I believe IV and V). They were suggested in the late 60's to deal with certain situations, but it was later shown that they do not do what was thought. Thus, at this point they are just a historical footnote.

As for what questions these are answering, you basically have that right already in your question:

Estimates using type I SS tell you how much of the variability in Y can be explained by A, how much of the residual variability can be explained by B, how much of the remaining residual variability can be explained by the interaction, and so on, in order.
Estimates based on type III SS tell you how much of the residual variability in Y can be accounted for by A after having accounted for everything else, and how much of the residual variability in Y can be accounted for by B after having accounted for everything else as well, and so on. (Note that both go both first and last simultaneously; if this makes sense to you, and accurately reflects your research question, then use type III SS.)

Related Solutions

Solved – Collinearity between categorical variables

Collinearity between factors is quite complicated. The classical example is the one you get when you group and dummy-encode the three continuous variables 'age', 'period' and 'year'. It is analysed in:

Kupper, L.L., Janis, J.M., Salama, I.A., Yoshizawa, C.N. Greenberg, B.G., & Winsborough, H.H. (1983). Age-period-cohort analysis: an illustration in the problems assessing interaction in one observation per cell data, Communicatios in Statistics - Theory and Methods, 12, 23, pp. 201-217.

The coefficients you get, after removing four (not three) references are only identified up to an unknown linear trend. This can be analysed because the collinearity arises from a known collinearity in the source variables (age+year=period).

Some work has also been done on spurious collinearity between two factors. It has been analysed in:

Eccleston, J.A. & Hedayat, A. (1974). On the theory of connected designs: Characterization and optimality, The Annals of Statistics, 2, 6, pp. 1238-1255.

The upshot is that collinearity among categorical variables means that the dataset must be split into disconnected parts, with a reference level in each component. Estimated coefficients from different components can not be compared directly.

For more complicated collinearities between three or more factors, the situation is complicated. There do exist procedures for finding estimable functions, i.e. linear combinations of the coefficients which are interpretable, e.g. in:

"On the connectivity of row-column designs" by Godolphin and Godolphin in Utilitas Mathematica (60) pp 51-65

But to my knowledge no general silver-bullet for handling such collinearities in an intuitive way exists.

Solved – calculating regression sum of square in R

SS(Regression) = SS(Total) - S(Residual)

You can get SS(Total) by:

SSTotal <- var( brainIQ$PIQ ) * (nrow(brainIQ)-1)
    SSE     <- sum( mylm$resid^2 )
SSreg   <- SSTotal - SSE

The degrees of freedom for the "Regression" row are the sum of the degrees of freedom for the corresponding components of the Regression (in this case: Brain, Height, and Weight).

Then to get the rest:

dfE   <- mylm$df.residual
dfReg <- nrow(brainIQ) - 1 - dfE
MSreg <- SSreg / dfReg
MSE   <- SSE / dfE
Fstat <- MSreg / MSE
pval  <- pf( Fstat , dfReg, dfE , lower.tail=FALSE )