I have a system / operator repeated measures data set for which I would like to answer the questions:
- is system 1 quicker than system 2
- is operator 1 quicker than operator 2
- does the order of using the systems matter
So I've hopefully understood the problem correctly to be a "system vs. operator with interactions" question.
The measurements in regard to both system and operator are randomised, but unbalanced, i.e., there are different number of measurements for each operator and the order in which they used the systems. (Unfortunate, but the way it goes and somewhat related to the randomisation.)
The data is in the format:
subject time1 time2 order operator
1 1 1269 422 1 2
2 2 795 327 2 2
3 3 1866 551 1 2
4 4 1263 382 2 2
5 5 1602 438 1 2
6 6 1359 423 2 2
7 7 850 415 2 1
8 8 1080 370 1 2
9 9 568 278 2 1
10 10 582 308 2 1
11 11 1094 654 2 2
...
I reformatted the data with the make.rm function (http://cran.r-project.org/doc/contrib/Lemon-kickstart/makerm.R) to make it one condition per line. This is so as to allow a "univariate" repeated measures analysis of the data (per the comments of the make.rm function).
Therefore the data is now:
subject order operator repdat contrasts
1 1 1 2 1269 T1
2 2 2 2 795 T1
...
x 1 1 2 422 T2
y 2 2 2 327 T2
...
And then I perform the analysis of variance with this data:
analysis <- aov( repdat ~ order * operator + Error( subject / ( order * operator ), data = data.formatted ) )
with a result, but also an error message: "Error() model is singular"
So my questions are:
- Have I understood the problem correctly? Am I even doing it right?
- Are the results valid?
- Why do I get an "Error() model is singular"
- What impact does the unbalanced nature of the data play?
Best Answer
You are using the syntax for within-subject 2-way repeated measures ANOVA. That would assume that all combinations of order and operator are repeated within each subject. That's not what you have. I think some degree of missingness might be OK, but in your data each subject has only one combination of order and operator, so the within-subject variance of those effects cannot be assessed.
Another issue is that you lost the "system" variable, which is called
contrasts
in your second output. There are replicate measurements ofcontrast
within subject, though. So perhaps something like the following might work (I have not thought through how many of the interaction terms are estimable):As a simpler, alternative solution, you might want to consider modeling
time1 - time2
in the original data - that measures the system effect, as a function of order and operator:This would not work for more than two systems, but here there is an additional benefit that you could try other measures of the effect size such as the ratio, or log-ratio, which might better fit the assumptions of ANOVA. Differences in timing outcomes are often non-normal and heteroscedastic.