Solved – Unbalanced two-way ANOVA in R Studio

I have a dataset that is unbalanced. I am testing how temperature and the size of a carcass affect the development rate of maggots. The duration is the time spent in a particular development stage of the maggot. I found the higher the temperature and the larger the carcass, the faster development (shorter duration).

My response variable is Duration of Eggs (Eggs for short in coding) and my two factors are Temperature (4 levels = 15, 20, 25, 30) and Size (2 levels = small and large). The majority of the sample sizes are 4; however one group is 7.

I intend to examine how Duration of Eggs varies with Temperature and Size.
I want to use ANOVA and after much reading I think two-way unbalanced ANOVA should be used.

I imported my data set (anova.data).
One function I have tried is:

anova(lm(Eggs ~ Temperature * Size, anova.data))

This gave me:

Analysis of Variance Table
Response: Eggs
                 Df  Sum Sq Mean Sq F value Pr(>F) 
Temperature       1 1828.37 1828.37 71.3971 1.521e-09 ***
Size              1    1.71    1.71  0.0669 0.7977 
Temperature:Size  1    1.02    1.02  0.0399 0.8429 
Residuals        31  793.86    25.61 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, I am uncertain if this takes into account that it is unbalanced.
After further reading I found the function Anova() [in car package] can be used to compute two-way ANOVA test for unbalanced designs. Out of the three fundamentally different ways to run an ANOVA in an unbalanced design, I read that the recommended method is the Type-III sums of squares. (Not sure why this is though).

So (after install.packages("car")), I tried a second function:

library(car)
my_anova <- aov(Eggs ~ Temperature * Size, data = anova.data)
Anova(my_anova, type = "III")
Anova Table (Type III tests)
Response: Eggs
                Sum Sq  Df F value Pr(>F) 
(Intercept)     2875.68  1 112.2941 7.883e-12 ***
Temperature      858.05  1  33.5065 2.243e-06 ***
Size               0.45  1 0.0178 0.8948 
Temperature:Size   1.02  1 0.0399 0.8429 
Residuals        793.86 31 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, this function gives different values and this second function has an additional Intercept value, which I am not familiar with. Which function is correct to use? Or is there an alternative function?

Also, how do I know if to use Type I, II and III sums of squares? I have done some reading but I am still unsure. I do not know if there is an interaction between Temperature and Size.

This is my dataset:

15°C    Small:  43.0, 43.0, 43.0, 43.0
15°C    Large:  40.5, 40.5, 40.5, 40.5
20°C    Small:  24.0, 24.0, 24.0, 23.5, 23.5, 23.5, 23.5 
20°C    Large:  24.0, 24.0, 24.0, 24.0
25°C    Small:  20.0, 20.0, 20.0, 20.0
25°C    Large:  20.0, 20.0, 20.0, 20.0
30°C    Small:  20.0, 20.0, 20.0, 20.0
30°C    Large:  20.0, 20.0, 20.0, 20.0

Temp Size Duration 15 Small 43 15 Small 43 15 Small 43 15 Small 43 15 Large 40.5 15 Large 40.5 15 Large 40.5 15 Large 40.5 20 Small 24 20 Small 24 20 Small 24 20 Small 23.5 20 Small 23.5 20 Small 23.5 20 Small 23.5 20 Large 24 20 Large 24 20 Large 24 20 Large 24 25 Small 20 25 Small 20 25 Small 20 25 Small 20 25 Large 20 25 Large 20 25 Large 20 25 Large 20 30 Small 20 30 Small 20 30 Small 20 30 Small 20 30 Large 20 30 Large 20 30 Large 20 30 Large 20

Best Answer

Thanks for posting the data. I suspect that this version will be easier for many people to work with.

I have two comments by way of an answer. I have to say that these data seem very strange. I doubt the utility of any kind of analysis of variance here. Whether your approach is appropriate statistically and scientifically needs to be clear before you worry about how to do it.

First off, there is almost no variability within groups defined by the same factors. That is a real surprise for any kind of data, and certainly for biological data.

Second, the big deal is being at 15$^\circ$C and a medium deal is at being at 20$^\circ$C. Size seems to have a minor or even negligible effect.

Perhaps you need a model with non-linear forcing of temperature. I can't say what functional forms best suit the underlying science here.

P.S. If I were trying to publish a similar graph, I would be more careful about giving measurement units. But this is just an exploratory graph.

Best Answer

Related Solutions

Solved – Post hoc tests from Anova() from car-package

Solved – Use of three way Anova or Ancova in R

Related Question