Solved – Help with unbalanced 2-factor ANOVA in R

anovar

I have a dataset that is unbalanced, where I have a response variable (age) and 2 factors (factor 1 = gender; 2 levels = m and f; factor 2 = kill type; 3 levels = a, b, c). Sample sizes for each vary greatly depending on the kill type. I intend to examine how age varies among gender and regions – and after reading a lot it looks like a 2 factor ANOVA unbalanced is a solution; I am not sure how to decide, or if I should decide, what factors are fixed and random, and then which unbalanced approach I should use in R. This is pretty much the first time I have encountered this unbalanced 2 factor ANOVA and I am seeking some advice on how best to test my hypothesis.

Best Answer

Fixed factors are factors for which the only levels you want the model to apply are the ones you are observing. Random factors are those for which you may want to generalize to levels other than those you are inserting into your model.

  • You will want gender to be a fixed effect, as there are only two genders (or at least main genders).
  • If age were an IV (independent variable), it would almost certainly be a random effect, since you would almost certainly want to generalize to more than just the ages in your data. I just mention that as an example of a random effect IV, though, as in your model it is a DV (dependent variable).
  • kill type I do not know enough about. Is there a variety of kill types wider than those you are processing, or are you looking at the full set (or at least the full set you would ever want to consider)? If it's the full set, it should be a fixed. If not, it should be random. -- Edit: I am leaving the reasoning here for future readers, but based on your comment "The kill type is all I will observe, the full range of all kill types.", it sounds like a fixed effect. Of course, I am assuming that when you say "the full range of all kill types", you mean the full range that exist or the full range you would ever want to generalize to, and not just the full range you see in your data.

As for the unbalanced Analysis of Variance, according to another SE question (), you can use the Anova function from library car with parameter type="III", and it will use the Type III sum of squares, which seems to be more appropriate for the unbalanced case. (Function anova (or summary.aov) uses the type I (or sequential) sum of squares, which seems to be less appropriate for the unbalanced case, and "aov" documentation states "aov is designed for balanced designs, and the results can be hard to interpret without balance".)

Related Question