Solved – Unbalanced two-way ANOVA in R Studio

anovartwo-way

I have a dataset that is unbalanced. I am testing how temperature and the size of a carcass affect the development rate of maggots. The duration is the time spent in a particular development stage of the maggot. I found the higher the temperature and the larger the carcass, the faster development (shorter duration).

My response variable is Duration of Eggs (Eggs for short in coding) and my two factors are Temperature (4 levels = 15, 20, 25, 30) and Size (2 levels = small and large). The majority of the sample sizes are 4; however one group is 7.

I intend to examine how Duration of Eggs varies with Temperature and Size.
I want to use ANOVA and after much reading I think two-way unbalanced ANOVA should be used.

I imported my data set (anova.data).
One function I have tried is:

anova(lm(Eggs ~ Temperature * Size, anova.data))

This gave me:

Analysis of Variance Table
Response: Eggs
                 Df  Sum Sq Mean Sq F value Pr(>F) 
Temperature       1 1828.37 1828.37 71.3971 1.521e-09 ***
Size              1    1.71    1.71  0.0669 0.7977 
Temperature:Size  1    1.02    1.02  0.0399 0.8429 
Residuals        31  793.86    25.61 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, I am uncertain if this takes into account that it is unbalanced.
After further reading I found the function Anova() [in car package] can be used to compute two-way ANOVA test for unbalanced designs. Out of the three fundamentally different ways to run an ANOVA in an unbalanced design, I read that the recommended method is the Type-III sums of squares. (Not sure why this is though).

So (after install.packages("car")), I tried a second function:

library(car)
my_anova <- aov(Eggs ~ Temperature * Size, data = anova.data)
Anova(my_anova, type = "III")
Anova Table (Type III tests)
Response: Eggs
                Sum Sq  Df F value Pr(>F) 
(Intercept)     2875.68  1 112.2941 7.883e-12 ***
Temperature      858.05  1  33.5065 2.243e-06 ***
Size               0.45  1 0.0178 0.8948 
Temperature:Size   1.02  1 0.0399 0.8429 
Residuals        793.86 31 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, this function gives different values and this second function has an additional Intercept value, which I am not familiar with. Which function is correct to use? Or is there an alternative function?

Also, how do I know if to use Type I, II and III sums of squares? I have done some reading but I am still unsure. I do not know if there is an interaction between Temperature and Size.

This is my dataset:

15°C    Small:  43.0, 43.0, 43.0, 43.0
15°C    Large:  40.5, 40.5, 40.5, 40.5
20°C    Small:  24.0, 24.0, 24.0, 23.5, 23.5, 23.5, 23.5 
20°C    Large:  24.0, 24.0, 24.0, 24.0
25°C    Small:  20.0, 20.0, 20.0, 20.0
25°C    Large:  20.0, 20.0, 20.0, 20.0
30°C    Small:  20.0, 20.0, 20.0, 20.0
30°C    Large:  20.0, 20.0, 20.0, 20.0

Best Answer

Thanks for posting the data. I suspect that this version will be easier for many people to work with.

Temp Size Duration 
15 Small 43 
15 Small 43 
15 Small 43 
15 Small 43 
15 Large 40.5 
15 Large 40.5 
15 Large 40.5 
15 Large 40.5 
20 Small 24 
20 Small 24 
20 Small 24 
20 Small 23.5 
20 Small 23.5 
20 Small 23.5
20 Small 23.5
20 Large 24 
20 Large 24 
20 Large 24 
20 Large 24 
25 Small 20 
25 Small 20 
25 Small 20 
25 Small 20 
25 Large 20 
25 Large 20 
25 Large 20 
25 Large 20 
30 Small 20 
30 Small 20 
30 Small 20 
30 Small 20 
30 Large 20 
30 Large 20 
30 Large 20 
30 Large 20 

I have two comments by way of an answer. I have to say that these data seem very strange. I doubt the utility of any kind of analysis of variance here. Whether your approach is appropriate statistically and scientifically needs to be clear before you worry about how to do it.

First off, there is almost no variability within groups defined by the same factors. That is a real surprise for any kind of data, and certainly for biological data.

Second, the big deal is being at 15$^\circ$C and a medium deal is at being at 20$^\circ$C. Size seems to have a minor or even negligible effect.

Perhaps you need a model with non-linear forcing of temperature. I can't say what functional forms best suit the underlying science here.

P.S. If I were trying to publish a similar graph, I would be more careful about giving measurement units. But this is just an exploratory graph.

enter image description here