I encountered some problems with how to analyze my unequal design data set.
- The number of observations was influenced by survival rate of
seedlings in tree nursery, where different concentrations of
fertilizer were applied. - I conducted Anova test for my stacked data (which is the height of
seedlings and the various concentrations (5) of applied fertilizer).
But can I use simply Anova (aov) if I have unequal sample size? Though the real fun started, when I was trying to find out is TukeyHSD a suitable post-hoc test in this case. Some sources said that Tukey's test is designed for balanced data, while some claimed that TukeyHSD is considered the best available method when confidence intervals are needed or sample sizes are not equal.
I tried to calculate Anova(in excel) for one of the pairs and I got different p-value than from the TukeyHSD test in R. So I assume that the various sample size in one or in another case influenced the result I got.
Would some of you have any suggestions?
Best Answer
You need balanced data for the usual tables and hand calculations to be correct. However, if you use the R
glht
function in the multcomp package, its calculations are based on the multivariate $t$ distribution with the funny covariance structure you get with unequal sample sizes, so the adjusted P values are correct as long as the normality, homoscedasticity, and independence assumptions hold. The needed call would be something likeYou can also get these adjustments via the lsmeans package and a call like