Solved – Comparing unbalanced groups with ANOVA/Kruskal-Wallis when one group has only 1 observation

anovagroup-differenceshypothesis testingkruskal-wallis test”small-sample

I would like to compare a continuous variable across 5 health status groups, but one of the groups has only 1 observation. Would an ANOVA/Kruskal-Wallis be valid? What can I do about the group with only 1 observation?

The boxplots for the continuous variable across the groups look as follows:
enter image description here

The group sizes are:

A: 12
B: 8
C: 9
D: 7
E: 1

Background info: I work in biostatistics and my datasets are fairly small, and some outcomes are rare in biology. Collecting samples is also dependent on patients and is an expensive procedure, so getting a larger sample is not a feasible solution.

Edit: What I currently do is to omit the group with only one observation in my analysis, however I am not sure about the validity of that.

Best Answer

You can perform inference in one-way-ANOVA type designs where there's a group with only one observation if you make the equal-variance assumption. If you don't assume equal variance (or some other informative variance structure), then you won't have information about variance in the singleton group.

Not all packages will deal with it in their implementation of ANOVA, it depends on how they're set up, but that doesn't mean it can't be done. [I gave an example of performing a t-test with a singleton in another answer on site.]

Here's an example in R with a singleton group both included and omitted for both one-way ANOVA and Kruskal-Wallis:

x=rnorm(100)
g=as.factor(rep(1:5,c(40,30,20,9,1)))
anova(lm(x~g))
Analysis of Variance Table

Response: x
          Df Sum Sq Mean Sq F value Pr(>F)
g          4  6.554 1.63839  1.6588  0.166
Residuals 95 93.830 0.98769               

anova(lm(x[-100]~g[-100]))
Analysis of Variance Table

Response: x[-100]
          Df Sum Sq Mean Sq F value Pr(>F)
g[-100]    3  3.498 1.16608  1.1806 0.3213
Residuals 95 93.830 0.98769               

kruskal.test(x~g)

        Kruskal-Wallis rank sum test

data:  x by g
Kruskal-Wallis chi-squared = 5.9232, df = 4, p-value = 0.205

kruskal.test(x[-100]~g[-100])

        Kruskal-Wallis rank sum test

data:  x[-100] by g[-100]
Kruskal-Wallis chi-squared = 3.1894, df = 3, p-value = 0.3633
Related Question