Solved – General Linear Model Univariate with unequal variances – what are the options

heteroscedasticitylevenes-testmultiple regression

I'm using SPSS to run a GLM (general linear model) univariate with 1 fixed factor (Treatment) and one random factor (experimental replicate). There are 4 treatment groups. The measurement is number of cells per embryo. The Levene's Test for Equality of Error Variances is significant (P=0.000) and I can see from the Spread vs Level plot that there may be a pattern.

What are my options from here?

I have tried log transforming my data, and that increased the P value of the Levene's Test to P=0.02, but there still appears to be pattern in the Spread vs Level plot.

I know that I could use a Post hoc test that does not assume equal variances (Tamhane's T2 or Dunnett's T3), or I could use a Kruskal-Wallis H, but both of these are only possible with 1 factor, not two.

I would really appreciate any help with this!

spread vs level
boxplot

Best Answer

For count data you certainly expect heteroskedasticity (and likely some skewness), and there are analyses that are specifically designed for several kinds of count response (specifically, the other kind of GLM).

With count data, a log-transform will "overcompensate" for the relationship between mean and variance, leaving you with the opposite pattern to the one you started with (the larger means will now be the ones with smaller spread).

A fairly typical analysis for an open-ended count would tend to involve a Poisson or negative binomial generalized linear model for the count, which should explain much of the observed heteroskedasticity.

If you must use a general linear model with a transformation the usual one for a Poisson count would be a square root, but it's not really as good as a model actually designed for counts. Given that some counts are as low as 10 you might even consider $\sqrt{y+\frac{3}{8}}$ or a Freeman-Tukey.

(See here for some discussion of the use of transformations with count data. There's a bit of information here that may also be helpful.)

There are lots of posts on site about the use of Poisson regression models, and other count-models, including negative binomial models.

I just realized I didn't talk about the random effect term. If you have a random effect in your model, that would suggest you might use generalized linear mixed models (GLMM). Again there are a number of posts on site about those. [I don't know whether SPSS does those but transformation may still give an adequate description of the data.]