Solved – ANOVA with outlier group

anovaheteroscedasticityoutliersr

This is my first question (previously the search function has been enough), so please bear with me. I have a very simple experimental design with one outcome variable and 5 groups. My typical strategy in this case would be to run a simple ANOVA and then use something like a Tukey's test to calculate significance between groups.

In this case, one of the groups has a mean that is way above the other 4. If I exclude the group with the very high mean there are many significant pairs in the data. Including the very high group gives significance only in comparisons with that group. The groups don't have equal variance which I know is a problem, but I'm not sure how to deal with it. I've tried a "robust" anova using the package robustbase, but I haven't been able to figure out a suitable post-hoc test. Any help you can offer on how to analyze something like this would be greatly appreciated.

Here's a simplified version of the code:

#Baseline condition (this is what the Test conditions need to be compared with)
Baseline = c(450,400,200,250)

#Negative control
Control = c(13,22,17,20)

#Test conditions
Test1 = c(200,400,450,300) 
Test2 = c(120,140,90,80) 
VeryHighTest = c(2700,2500,1800,1750)

#Constructing a data frame for ANOVA including all data########
Labels.all =     c(rep('Baseline',4),rep('Control',4),rep('Test1',4),rep('Test2',4),rep('VeryHighTest',4))
data.all = c(Baseline,Control,Test1,Test2,VeryHighTest)
df.allValues = data.frame(Labels=Labels.all, Values=data.all)

#Constructing data frame for ANOVA excluding the VeryHigh group#######
Labels.low = Labels.all[1:16]
data.low = data.all[1:16]
df.lowValues = data.frame(Labels=Labels.low, Values=data.low)

############ANOVAs##############
anova.all = aov(Values ~ Labels, data = df.allValues)
summary(anova.all) #P value on the order of 10^-9
anova.low = aov(Values ~ Labels, data = df.lowValues)
summary(anova.low) #P value < 0.0001

##########Post-hocs##############
phoc.low = TukeyHSD(anova.low)         #Many comparisons are significant  
phoc.low
phoc.all = TukeyHSD(anova.all)         #!!!!Only comparisons with VERYHIGHTEST are significant!!!!#
phoc.all

Would I be justified in excluding the VeryHighGroup from my analysis because the variance is so high and then maybe do a single T-test between VeryHighGroup and all the other groups combined? Clearly, I'm out of my statistical depth.

Here's the residual plot for each group.

Residual Plot

Best Answer

I don't think this calls for robust statistics or post hoc tests at all. The elephant visible in the room is that the data are best analysed on logarithmic scale.

Here is a plot of the raw data. I ordered the replicates by magnitude to make clearer that variability is about constant on that scale. In practice I wouldn't transform; I would use a generalised linear model with logarithmic link.

enter image description here

Related Solutions

Solved – Post hoc test other than Tukey for factorial ANOVA data in R

You might want to look at the lsmeans package or the multcomp package. Here is some demonstration R code to show you some of the options you have.

Load both packages. Also, the car library provides some functionality for carrying out analysis of variance that is sometimes helpful.

# Load handy packages.
library(car)
library(lsmeans)
library(multcomp)

Then read the data and check what shape it arrived.

# Read the data.
Example <- read.csv("Example.csv")
# What is the structure of the data.
str(Example)

Looking at the structure of the data, the "Control" value of Treatment is the first level of the factor. That is handy because that is usually the default reference value.

'data.frame':   27 obs. of  3 variables:
 $ Treatment  : Factor w/ 3 levels "Control","Nitrogen",..: 3 3 3 3 3 3 3 3 3 2 ...
 $ Stage      : Factor w/ 3 levels "Green","Pink",..: 1 1 1 2 2 2 3 3 3 1 ...
 $ Chlorophyll: num  0.2 0.3 0.4 0.5 0.3 0.2 0.5 0.6 0.7 0.4 ...

Perform the analysis and a little bit of model-checking.

# Fit the two-way factorial model.
fit <- lm(Chlorophyll ~ Treatment + Stage + Treatment:Stage, Example)
# Look at the model goodness of fit.
plot(fit)
shapiro.test(residuals(fit))

The model fits pretty well. Good job on the fake data set! Now look at the results.

# Perform an analysis of variance.
Anova(fit)

No matter how you look at it, there is a two-way interaction effect in these data.

Anova Table (Type II tests)

Response: Chlorophyll
                 Sum Sq Df F value   Pr(>F)   
Treatment       0.00519  2  0.1273 0.881279   
Stage           0.12741  2  3.1273 0.068283 . 
Treatment:Stage 0.53259  4  6.5364 0.001972 **
Residuals       0.36667 18                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can use the lsmip() function to plot the interaction effects.

# Look at interaction effects via least squares means.
lsmip(fit, Stage ~ Treatment)

If we ignore the interaction with Stage, we can look at all pairwise comparisons of Treatment using the Tukey adjustment for multiple testing. The [[2]] just picks off the second part of the list of results.

# Look at Treatment averaged over levels of Stage.
lsmeans(fit, pairwise ~ Treatment)[[2]]

However, for these data, it would be best to take into account the interaction. We can "slice" the interaction by Stage and compare all levels of Treatment to each other for each Stage.

# Compare sliced least squares means using the Tukey method.
fit.tukey <- lsmeans(fit, pairwise ~ Treatment | Stage)[[2]]
fit.tukey
cld(fit.tukey)

The cld() function provides the usual letter codes. We can see the interaction effect in the differential pattern of significant differences among the groups.

Stage = Green:
 contrast                estimate        SE df t.ratio p.value .group
 Control - Nitrogen -1.188445e-17 0.1165343 18   0.000  1.0000  1    
 Control - Salt      3.333333e-01 0.1165343 18   2.860  0.0268   2   
 Nitrogen - Salt     3.333333e-01 0.1165343 18   2.860  0.0268  12   

Stage = Pink:
 contrast                estimate        SE df t.ratio p.value .group
 Nitrogen - Salt     3.962759e-17 0.1165343 18   0.000  1.0000  1    
 Control - Nitrogen  1.666667e-01 0.1165343 18   1.430  0.3470  1    
 Control - Salt      1.666667e-01 0.1165343 18   1.430  0.3470  1    

Stage = Red:
 contrast                estimate        SE df t.ratio p.value .group
 Control - Salt     -4.000000e-01 0.1165343 18  -3.432  0.0079  1    
 Nitrogen - Salt    -3.000000e-01 0.1165343 18  -2.574  0.0478  12   
 Control - Nitrogen -1.000000e-01 0.1165343 18  -0.858  0.6727   2   

P value adjustment: tukey method for a family of 3 means 
significance level used: alpha = 0.05

Using Dunnett's test might be better if you don't care about all pairwise comparisons. This code compares against the reference level mentioned above, but you have much more flexibility if needed.

# Compare sliced least squares means via Dunnett's method.
fit.lsm <- lsmeans(fit, "Treatment", by=c("Stage"))
contrast(fit.lsm, "trt.vs.ctrl")

Results are pretty similar to those obtained using the Tukey method in this case.

Two-Way ANOVA – Handling Unbalanced and Nested TYPE III with Significant Results

Whether you insist on keeping "obesity" as an all-or-none marker or, as Frank Harrell rightly suggests, you treat it as one or more continuous predictors (e.g., including both height and weight in some way), you have several potential sources of confusion in the way you are approaching this problem.

First, as the manual page for car::Anova says:

"Be careful of type-III tests... type-II tests are invariant with respect to (full-rank) contrast coding. If you don't understand this issue, then you probably shouldn't use Anova for type-III tests."

It's not clear how useful type-III tests will be, particularly with an unbalanced design and a potentially significant interaction. Your choice of type-III tests also seems to be forcing you to use sum contrasts. (I personally find the coefficients from sum contrasts harder to think about than those from the default treatment contrasts, although that might be my limitation and you might find otherwise.)

Second, it's not clear that you have more than one observation per participant for any one response variable. If you don't, then there's no "random effect" to deal with in univariate analysis, as you seem to be trying to do in your code. (If you have multiple response variables for each participant you would benefit from some type of multivariate analysis.)

Third, you ask "why is significant the ANOVA but not the interaction?" This might be a problem with terminology, but that's not what you show: the interaction term is significant (if barely) by the traditional p < 0.05 criterion.

Perhaps you are having problems identifying a particular "significant" pairwise comparison among the combinations of groups despite that overall finding of a "significant" interaction. That can happen in some circumstances, particularly when the overall "significance" is borderline as in the case you display.

Doing a set of pairwise t-tests as you propose would be an error unless you took into account the multiple-comparisons problem in some way. The TukeyHSD is only one way to do that; there are others.

Best Answer

Related Solutions

Solved – Post hoc test other than Tukey for factorial ANOVA data in R

Two-Way ANOVA – Handling Unbalanced and Nested TYPE III with Significant Results

Related Question