Solved – Trying to choose the right post-hoc analysis for the unbalanced dataset

post-hocr

I encountered some problems with how to analyze my unequal design data set.

  • The number of observations was influenced by survival rate of
    seedlings in tree nursery, where different concentrations of
    fertilizer were applied.
  • I conducted Anova test for my stacked data (which is the height of
    seedlings and the various concentrations (5) of applied fertilizer).

But can I use simply Anova (aov) if I have unequal sample size? Though the real fun started, when I was trying to find out is TukeyHSD a suitable post-hoc test in this case. Some sources said that Tukey's test is designed for balanced data, while some claimed that TukeyHSD is considered the best available method when confidence intervals are needed or sample sizes are not equal.

I tried to calculate Anova(in excel) for one of the pairs and I got different p-value than from the TukeyHSD test in R. So I assume that the various sample size in one or in another case influenced the result I got.

Would some of you have any suggestions?

Best Answer

You need balanced data for the usual tables and hand calculations to be correct. However, if you use the R glht function in the multcomp package, its calculations are based on the multivariate $t$ distribution with the funny covariance structure you get with unequal sample sizes, so the adjusted P values are correct as long as the normality, homoscedasticity, and independence assumptions hold. The needed call would be something like

summary(glht(model, mcp(tukey = "trt")))

You can also get these adjustments via the lsmeans package and a call like

pairs(lsmeans(model, "trt"), adjust = "mvt")
Related Question