Variance Homogeneity – How to Test Homogeneity of Variance for Two Groups with Different Sample Sizes

heteroscedasticityrvariance

I have two groups of data that have different sample sizes and in order to be able to analyze both sets they must have the same variance. I was told I should use Bartlett's to test the homogeneity of variance, but when I try to run the test in R it says that the two groups must have the same sample size.

  1. Does Bartlett's test require the groups to have the same sample size?

  2. How was my labmate able to analyze a similar dataset (two groups, different sample sizes) using Bartlett's?

  3. What other test could I use that would show the two groups have similar variances?

Best Answer

I don't know what code you used, but tests do not require equal sample sizes. You can use Levene's test to check for heteroscedasticity. In R, you can use ?leveneTest in the car package:

set.seed(9719)                       # this makes the example exactly reproducible
g1 = rnorm( 50, mean=2, sd=2)        # here I generate data w/ different variances
g2 = rnorm(100, mean=3, sd=3)        #   & different sample sizes
my.data = stack(list(g1=g1, g2=g2))  # getting the data into 'stacked' format

library(car)                         # this package houses the function
leveneTest(values~ind, my.data)      # here I test for heteroscedasticity:
# Levene's Test for Homogeneity of Variance (center = median)
#        Df F value   Pr(>F)   
# group   1  8.4889 0.004128 **
#       148                    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Levene's test is just a $t$-test ($F$-test) on transformed data. (I discuss tests for heteroscedasticity here: Why Levene test of equality of variances rather than F ratio?) What having unequal sample sizes will do is cause you to have less power to detect a difference. To understand this more fully, it may help to read my answer here: How should one interpret the comparison of means from different sample sizes? Note however, that running a test of your assumptions and then choosing a primary test is not generally recommended (see, e.g., here: A principled method for choosing between t-test or non-parametric e.g. Wilcoxon in small samples). If you are worried that there may be heteroscedasticity, you might do best to simply use a test that won't be susceptible to it, such as the Welch $t$-test, or even the Mann-Whitney $U$-test (which doesn't even require normality). Some information about alternative strategies can be gathered from my answer here: Alternatives to one-way ANOVA for heteroskedastic data.

Related Question