You might want to look at the lsmeans
package or the multcomp
package. Here is some demonstration R
code to show you some of the options you have.
Load both packages. Also, the car
library provides some functionality for carrying out analysis of variance that is sometimes helpful.
# Load handy packages.
library(car)
library(lsmeans)
library(multcomp)
Then read the data and check what shape it arrived.
# Read the data.
Example <- read.csv("Example.csv")
# What is the structure of the data.
str(Example)
Looking at the structure of the data, the "Control" value of Treatment is the first level of the factor. That is handy because that is usually the default reference value.
'data.frame': 27 obs. of 3 variables:
$ Treatment : Factor w/ 3 levels "Control","Nitrogen",..: 3 3 3 3 3 3 3 3 3 2 ...
$ Stage : Factor w/ 3 levels "Green","Pink",..: 1 1 1 2 2 2 3 3 3 1 ...
$ Chlorophyll: num 0.2 0.3 0.4 0.5 0.3 0.2 0.5 0.6 0.7 0.4 ...
Perform the analysis and a little bit of model-checking.
# Fit the two-way factorial model.
fit <- lm(Chlorophyll ~ Treatment + Stage + Treatment:Stage, Example)
# Look at the model goodness of fit.
plot(fit)
shapiro.test(residuals(fit))
The model fits pretty well. Good job on the fake data set! Now look at the results.
# Perform an analysis of variance.
Anova(fit)
No matter how you look at it, there is a two-way interaction effect in these data.
Anova Table (Type II tests)
Response: Chlorophyll
Sum Sq Df F value Pr(>F)
Treatment 0.00519 2 0.1273 0.881279
Stage 0.12741 2 3.1273 0.068283 .
Treatment:Stage 0.53259 4 6.5364 0.001972 **
Residuals 0.36667 18
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can use the lsmip()
function to plot the interaction effects.
# Look at interaction effects via least squares means.
lsmip(fit, Stage ~ Treatment)
If we ignore the interaction with Stage, we can look at all pairwise comparisons of Treatment using the Tukey adjustment for multiple testing. The [[2]]
just picks off the second part of the list of results.
# Look at Treatment averaged over levels of Stage.
lsmeans(fit, pairwise ~ Treatment)[[2]]
However, for these data, it would be best to take into account the interaction. We can "slice" the interaction by Stage and compare all levels of Treatment to each other for each Stage.
# Compare sliced least squares means using the Tukey method.
fit.tukey <- lsmeans(fit, pairwise ~ Treatment | Stage)[[2]]
fit.tukey
cld(fit.tukey)
The cld()
function provides the usual letter codes. We can see the interaction effect in the differential pattern of significant differences among the groups.
Stage = Green:
contrast estimate SE df t.ratio p.value .group
Control - Nitrogen -1.188445e-17 0.1165343 18 0.000 1.0000 1
Control - Salt 3.333333e-01 0.1165343 18 2.860 0.0268 2
Nitrogen - Salt 3.333333e-01 0.1165343 18 2.860 0.0268 12
Stage = Pink:
contrast estimate SE df t.ratio p.value .group
Nitrogen - Salt 3.962759e-17 0.1165343 18 0.000 1.0000 1
Control - Nitrogen 1.666667e-01 0.1165343 18 1.430 0.3470 1
Control - Salt 1.666667e-01 0.1165343 18 1.430 0.3470 1
Stage = Red:
contrast estimate SE df t.ratio p.value .group
Control - Salt -4.000000e-01 0.1165343 18 -3.432 0.0079 1
Nitrogen - Salt -3.000000e-01 0.1165343 18 -2.574 0.0478 12
Control - Nitrogen -1.000000e-01 0.1165343 18 -0.858 0.6727 2
P value adjustment: tukey method for a family of 3 means
significance level used: alpha = 0.05
Using Dunnett's test might be better if you don't care about all pairwise comparisons. This code compares against the reference level mentioned above, but you have much more flexibility if needed.
# Compare sliced least squares means via Dunnett's method.
fit.lsm <- lsmeans(fit, "Treatment", by=c("Stage"))
contrast(fit.lsm, "trt.vs.ctrl")
Results are pretty similar to those obtained using the Tukey method in this case.
Whether you insist on keeping "obesity" as an all-or-none marker or, as Frank Harrell rightly suggests, you treat it as one or more continuous predictors (e.g., including both height and weight in some way), you have several potential sources of confusion in the way you are approaching this problem.
First, as the manual page for car::Anova
says:
"Be careful of type-III tests... type-II tests are invariant with respect to (full-rank) contrast coding. If you don't understand this issue, then you probably shouldn't use Anova for type-III tests."
It's not clear how useful type-III tests will be, particularly with an unbalanced design and a potentially significant interaction. Your choice of type-III tests also seems to be forcing you to use sum contrasts. (I personally find the coefficients from sum contrasts harder to think about than those from the default treatment contrasts, although that might be my limitation and you might find otherwise.)
Second, it's not clear that you have more than one observation per participant for any one response variable. If you don't, then there's no "random effect" to deal with in univariate analysis, as you seem to be trying to do in your code. (If you have multiple response variables for each participant you would benefit from some type of multivariate analysis.)
Third, you ask "why is significant the ANOVA but not the interaction?" This might be a problem with terminology, but that's not what you show: the interaction term is significant (if barely) by the traditional p < 0.05 criterion.
Perhaps you are having problems identifying a particular "significant" pairwise comparison among the combinations of groups despite that overall finding of a "significant" interaction. That can happen in some circumstances, particularly when the overall "significance" is borderline as in the case you display.
Doing a set of pairwise t-tests as you propose would be an error unless you took into account the multiple-comparisons problem in some way. The TukeyHSD is only one way to do that; there are others.
Best Answer
I don't think this calls for robust statistics or post hoc tests at all. The elephant visible in the room is that the data are best analysed on logarithmic scale.
Here is a plot of the raw data. I ordered the replicates by magnitude to make clearer that variability is about constant on that scale. In practice I wouldn't transform; I would use a generalised linear model with logarithmic link.