Solved – Testing differences in variance between groups

categorical datahypothesis testingpost-hocrvariance

I have a hypothesis that a particular intervention/treatment will cause more variation in participant responses to a particular question.

The intervention variable is categorical, with five different treatment groups. The response variable (the participant responses to a question) is a continuous variable.

I don't necessarily expect the means to differ, I just expect greater variation in responses in certain groups as compared to others.

Can anyone advise me on a method to test the difference in the variance of different treatment groups? To be clear, this is not for the purpose of checking the assumption of homogeneity of variance for statistical tests like ANOVA, rather I am interested in specifically if and to what extent the variation differs between the different levels of my categorical intervention/treatment variable.

I thought to run Bartlett's test for homogeneity of variance but (in R at least) I only get an output that shows me whether the variances are homogeneous or not – it doesn't tell me where these differences lie between the different levels of the categorical variable (i.e. is it between group 1 and 5, or group 2 and 3 etc).

I thought also to try to calculate a coefficient of variation for each group and compare these, but I was not sure of a method of how to do this statistically …

I am probably missing something very obvious, but I cannot find information on how to proceed. Any advice would be much appreciated.

Best Answer

This is an interesting question! Post-hoc tests of variances after a test of unequal variances do not seem to be a much studied topic, I was not able to find any published papers. One similar question with some ideas is Post-hoc test to determine difference in variance. Another approach is the following.

But first, note that tests of variances are typically not very robust, so distributional assumptions do matter. Maybe you could include a plot of your data in the post. I will illustrate my methods with some simulated, normal data, if your data is far from normal be careful.

One test of variances which is somewhat robust is Levene's test. As it is based on an analysis of variance, but using absolute residuals as response, it is useful in this situation, we can construct Tukey HSD intervals based on those absolute residuals. The Levene's test is often nowadays used with median as location estimator in place of the mean, but here I will use the mean for illustration. The code for simulation the example data is at the end.

 with(mydf, lawstat::levene.test(X, Group, location="mean"))
oneway.test(absres ~ Group, data=mydf, var.equal=TRUE)

    Classical Levene's test based on the absolute deviations from the mean
    ( none not applied because the location is not set to median )

data:  X
Test Statistic = 4.2954, p-value = 0.003079

    One-way analysis of means

data:  absres and Group
F = 4.2954, num df = 4, denom df = 95, p-value = 0.003079

showing the equivalence. Then the post-hoc test:

TukeyHSD(aov(absres ~ Group, data=mydf))
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = absres ~ Group, data = mydf)

$Group
          diff         lwr      upr     p adj
B-A  0.9578979 -0.87239851 2.788194 0.5937356
C-A  1.9853491  0.15505269 3.815646 0.0265586
D-A  1.8196928 -0.01060365 3.649989 0.0521132
E-A  2.4268567  0.59656026 4.257153 0.0033957
C-B  1.0274512 -0.80284522 2.857748 0.5258877
D-B  0.8617949 -0.96850157 2.692091 0.6860887
E-B  1.4689588 -0.36133766 3.299255 0.1770429
D-C -0.1656563 -1.99595277 1.664640 0.9990998
E-C  0.4415076 -1.38878886 2.271804 0.9622041
E-D  0.6071639 -1.22313251 2.437460 0.8875472

The same approach is used here, but with medians in place of means. Code for simulating the data:

set.seed(7*11*13)# My public seed
N <- 20; k <- 5
Group <- rep(LETTERS[1:k], rep(N, k))
X <- rnorm(k*N, rep(rnorm(k, 10, 1), rep(N, k)),
           rep(2*sqrt(1:k), rep(N, k)))
mydf <- data.frame(Group=factor(Group), X)
rm(X, Group)
mydf$absres <- abs(resid(lm(X ~ Group, data=mydf)))

This could at least serve as a starting point.

With unequal variances, it could also be interesting to ask about reasons for it, specifically if the treatment could have anything to do with it. If so, this post could be of interest.

Related Question