Repeated Measures ANOVA – What Should Be Normally Distributed in Two-Way Repeated Measures ANOVA?

When running a two-way repeated measures ANOVA, should:

A) the independent varibles be normally distributed?
B) should the scores between the variables be normally distributed?

Overall: should the variables be normally distributed, the difference of scores (as such in a paired) test, anything else ?

I'm struggling with the dataset I've linked below, I know that we can interpret this Anova as a kind of lmer, so I started to wonder what should be normally distributed. For regression, I know that the variable's distribution doesn't matter (only the residuals), what about for a repeated measures two-way ANOVA, what should be normally distributed?
data:

CONT_Y ~ CATEGORIES * MY_GROUPS

structure(list(PARTICIPANTS = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 
3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 
8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 
13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 
17, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 
21, 21, 21, 21), CONT_Y = c(19.44, 20.07, 19.21, 16.35, 11.37, 
12.82, 19.42, 18.94, 19.59, 20.01, 19.7, 17.92, 18.78, 19.21, 
19.27, 18.46, 19.52, 20.02, 16.19, 19.97, 13.83, 15.93, 14.79, 
21.55, 18.8, 19.42, 19.27, 19.37, 17.14, 14.45, 17.63, 20.01, 
20.28, 17.93, 19.36, 20.15, 16.06, 17.04, 19.16, 20.1, 16.44, 
18.39, 18.01, 19.05, 18.04, 19.69, 19.61, 16.88, 19.02, 20.42, 
18.27, 18.43, 18.08, 17.1, 19.98, 19.43, 19.71, 19.93, 20.11, 
18.41, 20.31, 20.1, 20.38, 20.29, 13.6, 18.92, 19.05, 19.13, 
17.75, 19.15, 20.19, 18.3, 19.43, 19.8, 19.83, 19.53, 16.14, 
21.14, 17.37, 18.73, 16.51, 17.51, 17.06, 19.42), CATEGORIES = c("A", 
"A", "B", "B", "A", "A", "B", "B", "A", "A", "B", "B", "A", "A", 
"B", "B", "A", "A", "B", "B", "A", "A", "B", "B", "A", "A", "B", 
"B", "A", "A", "B", "B", "A", "A", "B", "B", "A", "A", "B", "B", 
"A", "A", "B", "B", "A", "A", "B", "B", "A", "A", "B", "B", "A", 
"A", "B", "B", "A", "A", "B", "B", "A", "A", "B", "B", "A", "A", 
"B", "B", "A", "A", "B", "B", "A", "A", "B", "B", "A", "A", "B", 
"B", "A", "A", "B", "B"), MY_GROUP = c("G1", "G2", "G1", "G2", 
"G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", 
"G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", 
"G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", 
"G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", 
"G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", 
"G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", 
"G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", 
"G2", "G1", "G2")), row.names = c(NA, -84L), spec = structure(list(
    cols = list(PARTICIPANTS = structure(list(), class = c("collector_double", 
    "collector")), CONT_Y = structure(list(), class = c("collector_double", 
    "collector")), CATEGORIES = structure(list(), class = c("collector_character", 
    "collector")), MY_GROUP = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x00000279757d98f0>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

Thanks in advance! 🙂

Bonus: by the way, I've researched a lot about a non-paremetric alternative, I've seen many interesting posts here, as far as I'm concerned, there's no alternative for the two-way repeated measures Anova, right?

maybe this package WRS2 can be an alternative ? (link for documentation

maybe a robust lmer may be a solution?

Best Answer

The linear regression model gives us a prediction that theoretically can take on any value in a continuous range when the regressor input is changed. In your data with the two categorical regressors, each of which can take on only two values, the two-way ANOVA (reaped measures or not) will give us only four predictions depending on the input combination of values of these two categories. These four predictions won't match the observed values in your sample. The differences between those four predictions and the observed data are basically the residuals.

Yes, for the two-way repeated measures ANOVA (like for the regular two-way ANOVA), those residuals must be normally distributed. In other words, the dependent variable must be normally distributed within the categorical combo "bins" (there are four of them in your dataset A∩G1, B∩G1, A∩G2, B∩G2).

Best Answer

Related Solutions

Solved – Two-way repeated measures ANOVA for categorial data

Two-Way ANOVA – Conducting Repeated Measures ANOVA with Unbalanced Data

Related Question