Solved – Smallest possible sample size per group in Levene’s test

anovahypothesis testinglevenes-testsample-size

Recently learned that Levene's test is a one-way equal-variance ANOVA done on absolute values of residuals calculated from the mean within each group.

For one-way ANOVA the minimal sample size seems to be having at least one group with more than 1 observation. But in the case of Levene's test it gets a bit tricky for me.

For example, if all groups have 2 observations each, the within-group variance will be 0. So the requirement would seem to be at least 1 group with at least 3 observations.

However what about situations where one group has only one sample? I did a few simulations in R using car::leveneTest() and it seems like p-values are not distributed uniformly in the case of 2 groups where one group has only one sample. Here is a demonstration:

library(car)
groups <- factor(c(rep("A", 999), "B"))
ps <- replicate(100, leveneTest(rnorm(1000), groups)[1,3])

> range(ps)
[1] 0.1681269 0.2107370

Basically, after simulating 100 scenarios where group 1 has 999 observations and group 2 has 1 observation, the p-values range from 0.16 to 0.22. The levenTest() function didn't complain, but that might be an oversight in the implementation.

Question: what are the minimum sample size requirements for Levene's test to be valid?

My current take: 2 samples per group with at least one group having 3 but I might have missed something.

Best Answer

As with most hypothesis tests small sample sizes inflate the occurrence of Type II errors, the test is "underpowered". That does not mean though that the test is moot but rather than it has a higher probability of being misleading. Ultimately, Levene's test is an $F$-test and should be treated as one.

I think it will be relevant to give warnings for potentially:

  1. having single observation groups ( $0$ degrees of freedom for the residuals (this will more or less equate to testing that we have more than one observation per group)) and
  2. $0$ within-group variances (no point treating a constant as an R.V.)

Given these two conditions are not met, the findings are "valid" in the sense they are sensible on face value. Note that these are warnings that stem from the "user's question" rather than the "R code's validity". In that sense we do not need to check for a minimum sample size but rather for cases that the sample used is inadequate to provide even an approximate answer. The statistical power of a test is not only a function of the sample size but also of the effect size, so strictly focusing on sample size misses part of the "power" problem.

Probing this a bit further the R code within car::leveneTest actually does an ANOVA on an lm object (Exempt from leveneTest.default: table <- anova(lm(resp ~ group))[, c(1, 4, 5)]) which brings us back to the case that standard ANOVA/lm warnings should probably adequate. In that sense:

A <- data.frame(y = runif(4), g = c(rep("a",2), rep("b",2)) )
car::leveneTest(y ~ g, A)

is a "valid" call and the problem/warning becomes that the lm has an $R^2$ = 1 showing that something went very fishy.