I've run a fully within-subjects repeated-measures ANOVA using the aov()
function. My dependent variable is not normally distributed, so I'm very interested in running assumption tests on my analysis. It seems that just calling plot()
on the output doesn't work for repeated-measures, so I've manually taken the residuals and the fitted values for a model of interest, and have plotted them against each other. I'm assuming that this is how I would plot to test for the assumption of Homoskedasticity.
The plot comes out with 2 vertical bands (please see the image below). It turns out the fitted values are all centred around 2 values (although according to ==
they are not exactly equal), where one is the negative of the other.
I have 2 questions:
1) Is this the correct way to manually test the assumption homoskedasticity? If not, how would I go about it from repeated-measures designs (since just calling plot()
doesn't work)?
2) If it is correct, what is this plot telling me? Why are the fitted values so clustered? What can I conclude from it?
Thanks heaps for any input here. Also, if you know of better ways to check (preferably plot) for assumptions in rm-ANOVAs, that would be useful information as well.
I've included some mock data here to replicate the scenario:
#Create mock data (there's probably a more efficient way to do this.. would also be nice to know! :) )
p <- sort(rep(1:20,8))
y <- rep(rep(1:2,4),20)
z <- rep(rep(c(1,1,2,2),2),20)
w <- rep(c(1,1,1,1,2,2,2,2),20)
x <- rnorm(160,10,2)
d <- data.frame(x,p=factor(p),y=factor(y),z=factor(z),w=factor(w))
#Run repeated-measures ANOVA
ex.aov <- aov(x ~ y*z*w + Error(p/(y*z*w)), d)
#Try to plot full object (doesn't work)
plot(ex.aov)
#Try to plot section of object (doesn't work)
plot(ex.aov[["p:y:z"]])
#Plot residuals against fitted (custom "skedasticity" plot - works)
plot(residuals(ex.aov[["p:y:z"]])~fitted(ex.aov[["p:y:z"]]))
Begin Edit
In light of the information provided by @Stefan , I've added some additional details below, using the improved data structure he proposed:
# Set seed to make it reproducible
set.seed(12)
#New variable names and generation
subj <- sort(factor(rep(1:20,8)))
x1 <- rep(c('A','B'),80)
x2 <- rep(c('A','B'),20,each=2)
x3 <- rep(c('A','B'),10, each=4)
outcome <- rnorm(80,10,2)
d3 <- data.frame(outcome,subj,x1,x2,x3)
#Repeated measures ANOVA
ex.aov <- aov(outcome ~ x1*x2*x3 + Error(subj/(x1*x2*x3)), d3)
#proj function
ex.aov.proj <- proj(ex.aov)
# Check for normality by using last error stratum
qqnorm(ex.aov.proj[[9]][, "Residuals"])
# Check for heteroscedasticity by using last error stratum
plot(ex.aov.proj[[9]][, "Residuals"])
The resulting plots are below:
Can anyone interpret the images above (especially the last one)? It looks like there is clustering and pattern structure. Can it be used to infer the presence of heteroskedasticity?
Best Answer
I'm assuming that a model which was fitted using the
Error()
function withinaov()
won't work when using inplot()
because you will get more than one error stratum from which you can choose. Now according to this information here, one should use theproj()
function which will give you the residuals for each error stratum, which then can be used for diagnostic plots.Edit 1 start
More information regarding multistratum models and the
proj()
function is given in Venables and Ripley, page 284 (but start from page 281): Residuals in multistratum analyses: Projections. In the second sentence they write (I highlighted in bold):For your example that means:
However, this will also lead into plots which I cannot fully interpret (especially the second one).
In their case, the last stratum was the
Within
stratum. Since your model cannot estimate this (presumably due to your error term), I am not sure if simply using your last stratum is valid.Hopefully someone else can clarify.
Edit 1 end
Edit 2 start
According to this source checking residuals to assess normality and heteroscedasticity should be performed without the
Error()
function.This seems reasonable to me but I hope someone else could clarify.
Edit 2 end
My alternative suggestion:
First, I changed your dataset slightly and set a seed to make it reproducible (might be handy for some problems you have in the future):
Second, I used a linear mixed-effects model instead since you have repeated measures and hence a random term you can use:
Using the
afex
package you can also get the fixed effects in ANOVA table format (you can also use theAnova()
function from thecar
package as another option):Check
?mixed
for the various options you can choose. Also regarding mixed models, there is a lot of information here on Cross Validated.