Solved – Is ANOVA valid for within subject accuracy rates

anovachi-squared-testhypothesis testinglogisticrepeated measures

I would like to test if subjects are significantly more or less accurate under some experimental conditions. At first I thought it's a job for ANOVA for repeated measures, but I am not sure anymore.

Experiment goes like this:
There are 4 experimental conditions. Every subject perform multiple trials of the task under each condition. For each trial, he can be either correct or not. I would like to see if some conditions make the task harder, so there is a different proportion of correct (C) vs noncorrect (N) trials.

subject    condition_1  condition_2  condition_3  condition_4   
   1       CCCNNNCCCNC  CNCNNNCCCNN  CCCCNNNCNCN  CCCCCNNNCCC
   2       CCCNNCNCNCN  CNCNCNNCNNN  CNNNCNCCNNN  CCCNCNCNCNC
   3       CCNCNCNCNCN  CNNCNCNCNNC  CNCNNNCCNNC  CNNCNCNCNNC
  ...

I don't think that ANOVA is a valid test, because the the mean accuracy per subject, per condition comes from binomial/count data. I read about chi squared and Cochran's Q test for binomial data, but I don't have only binary True/False for each subject and condition but accuracy rate. I also read suggestion for a logistic regression, but I don't know if that would be appropriate either and how to use it with repeated measures.

Can I use ANOVA for repeated measures?
If not why not, and what would happen if I do?
And what test is more appropriate?

Best Answer

No, ANOVA is not valid for the reason you stated: the assumption of normality is violated. Assuming you're not too close to the ceiling / floor (so that the distribution isn't truncated) and you have enough trials, then you might be able to make the argument that the binomial distribution converges to the normal as the number of trials goes to infinity, but this is somewhat unsatisfying.

The better option is to use logit mixed-effects models. See for example:

Jaeger, T. Florian (2008). Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. Journal of Memory and Language 59, 434-446.

Historically, I think ANOVA was used for two reasons: 1. when all you have is a hammer, every problem looks like a nail (i.e. ANOVA was the go-to technique for categorical data analysis and often the only one taught) 2. ANOVA is possible with little-to-no computational hardware, which was very important in the days before modern computers and statistical software made mixed models computationally acceptable.

In any case, mixed-effects models have the additional advantage of modelling by-subject and by-item variances simultaneously, which can have quite important ripple effects, cf:

Judd, Charles M., Jacob Westfall and David A. Kenny (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, Vol 103(1), Jul 2012, 54-69.

This problem was known even in the 70s, see for example Clark (1973) "The Language-as-Fixed-Effect Fallacy".