I have a dataset that is unbalanced. I am testing how temperature and the size of a carcass affect the development rate of maggots. The duration is the time spent in a particular development stage of the maggot. I found the higher the temperature and the larger the carcass, the faster development (shorter duration).
My response variable is Duration of Eggs
(Eggs for short in coding) and my two factors are Temperature
(4 levels = 15, 20, 25, 30) and Size
(2 levels = small and large). The majority of the sample sizes are 4; however one group is 7.
I intend to examine how Duration of Eggs
varies with Temperature
and Size
.
I want to use ANOVA and after much reading I think two-way unbalanced ANOVA should be used.
I imported my data set (anova.data).
One function I have tried is:
anova(lm(Eggs ~ Temperature * Size, anova.data))
This gave me:
Analysis of Variance Table
Response: Eggs
Df Sum Sq Mean Sq F value Pr(>F)
Temperature 1 1828.37 1828.37 71.3971 1.521e-09 ***
Size 1 1.71 1.71 0.0669 0.7977
Temperature:Size 1 1.02 1.02 0.0399 0.8429
Residuals 31 793.86 25.61
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, I am uncertain if this takes into account that it is unbalanced.
After further reading I found the function Anova()
[in car package] can be used to compute two-way ANOVA test for unbalanced designs. Out of the three fundamentally different ways to run an ANOVA in an unbalanced design, I read that the recommended method is the Type-III sums of squares. (Not sure why this is though).
So (after install.packages("car")
), I tried a second function:
library(car)
my_anova <- aov(Eggs ~ Temperature * Size, data = anova.data)
Anova(my_anova, type = "III")
Anova Table (Type III tests)
Response: Eggs
Sum Sq Df F value Pr(>F)
(Intercept) 2875.68 1 112.2941 7.883e-12 ***
Temperature 858.05 1 33.5065 2.243e-06 ***
Size 0.45 1 0.0178 0.8948
Temperature:Size 1.02 1 0.0399 0.8429
Residuals 793.86 31
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this function gives different values and this second function has an additional Intercept value, which I am not familiar with. Which function is correct to use? Or is there an alternative function?
Also, how do I know if to use Type I, II and III sums of squares? I have done some reading but I am still unsure. I do not know if there is an interaction between Temperature and Size.
This is my dataset:
15°C Small: 43.0, 43.0, 43.0, 43.0
15°C Large: 40.5, 40.5, 40.5, 40.5
20°C Small: 24.0, 24.0, 24.0, 23.5, 23.5, 23.5, 23.5
20°C Large: 24.0, 24.0, 24.0, 24.0
25°C Small: 20.0, 20.0, 20.0, 20.0
25°C Large: 20.0, 20.0, 20.0, 20.0
30°C Small: 20.0, 20.0, 20.0, 20.0
30°C Large: 20.0, 20.0, 20.0, 20.0
Best Answer
Thanks for posting the data. I suspect that this version will be easier for many people to work with.
I have two comments by way of an answer. I have to say that these data seem very strange. I doubt the utility of any kind of analysis of variance here. Whether your approach is appropriate statistically and scientifically needs to be clear before you worry about how to do it.
First off, there is almost no variability within groups defined by the same factors. That is a real surprise for any kind of data, and certainly for biological data.
Second, the big deal is being at 15$^\circ$C and a medium deal is at being at 20$^\circ$C. Size seems to have a minor or even negligible effect.
Perhaps you need a model with non-linear forcing of temperature. I can't say what functional forms best suit the underlying science here.
P.S. If I were trying to publish a similar graph, I would be more careful about giving measurement units. But this is just an exploratory graph.