# Solved – Significant effect with overlapping boxplots

anovaboxplotmixed modelrrepeated measures

I'm puzzled about some of the results that I got, after plotting the data.

I have a dataset where 38 participants are tested on three conditions "time" and are thus tested three times. I figured this is a classic repeated measures ANOVA problem.

Here's a reproducible example of my data:

data_ex <- data.frame( pnum = rep(1:38, times=3),
time = rep(c("t1", "t2", "t3"),  each=38),
score = c(0.6, 0.8, 1.0, 1.0, 0.3, 0.7, 0.9, 0.3, 0.8,
0.3, 0.7, 0.8, 0.8, 1.0, 0.9, 0.7, 0.8, 0.7,
1.0, 0.9, 0.4, 0.8, 0.7, 1.0, 1.0, 0.4, 0.9,
0.6, 0.3, 0.7, 1.0, 0.9, 0.4, 0.6, 0.3, 0.6,
0.8, 0.9, 0.7, 0.9, 0.8, 0.8, 0.6, 1.0, 0.9,
0.8, 1.0, 0.4, 0.9, 0.5, 0.6, 0.8, 0.7, 0.6,
0.8, 0.7, 0.8, 0.8, 0.4, 0.9, 0.9, 0.9, 1.0,
0.6, 0.6, 0.7, 0.3, 0.8, 1.0, 0.9, 0.9, 0.6,
0.3, 0.3, 0.9, 0.8, 0.6, 0.8, 1.0, 1.0, 0.6,
0.9, 0.9, 0.7, 1.0, 0.4, 0.7, 0.9, 0.4, 0.9,
0.9, 0.8, 0.8, 0.9, 0.8, 0.8, 0.7, 1.0, 0.7,
1.0, 1.0, 0.9, 1.0, 0.7, 0.5, 0.7, 1.0, 1.0,
0.4, 0.8, 0.4, 0.8, 1.0, 0.8))


To perform a repeated measures ANOVA in the tidyverse syntax, I used the anova_test() function from the rstatix package, which apparently is a wrapper around car::Anova().

library(tidyverse)
library(rstatix)
data_ex_anova  <- anova_test(data = data_ex, dv = score, wid = pnum, within = time)
get_anova_table(data_ex_anova)

ANOVA Table (type III tests)

Effect DFn DFd     F     p p<.05   ges
1   time   2  74 3.593 0.032     * 0.025


Here I find a significant effect. I further check where the difference comes from with the pairwise comparisons

data_ex_anova_pw <- data_ex %>% pairwise_t_test(score ~ time, paired = TRUE, p.adjust.method = "bonferroni")
data_ex_anova_pw

# A tibble: 3 x 10
* <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <dbl> <chr>
1 score t1      t2         38    38    -0.492    37 0.626 1     ns
2 score t1      t3         38    38    -2.88     37 0.007 0.02  *
3 score t2      t3         38    38    -1.94     37 0.06  0.181 ns


Seems like time 1 and 3 differ.

However, when plotting the results, I get suspicious:

ggplot(data=data_ex,aes(x = as.factor(time), y = score )) +
geom_boxplot(size = 1.25, color = "black", outlier.size = NA) + ylim(0,1.1) +
geom_jitter(size = 3, alpha = 0.5, stroke = 1.5, width = 0.01, height = 0.02) +
stat_summary(fun.y=mean, geom="point", shape =24, fill = "white", size = 5)


The boxplots are completely overlapping! The medians are basically the same and all the boxes are within each others ranges. Now why this is surprising to me, is when I read about boxplots, this should be an indication of non-significance. Only with large samples, this is not clearly the case. Now since I only have 38 observations per time point, I would not consider this a large sample.
Could anyone enlighten me what's going on?
Are the results valid or am I indeed performing the wrong test?

PS: I also dabbled with the lmer package, to hopefully account for the repeated participant in all time points, which led to a similar finding:

library(lme4)
library(car)
Anova(lmer(score~time + (1|pnum), data = data_ex))

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: score
Chisq Df Pr(>Chisq)
time 6.4703  1    0.01097 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


EDIT: I changed the variable "time" to a factor, results are still the same.