Solved – What to do with non-normality and heterogeneous variances in two-way ANOVA when transformations do not work

anovadata transformationheteroscedasticityordinal-dataregression-strategies

I'm conducting a Two-Way ANOVA with my two factors being Sex and Cohort. I have data from two cohorts of subjects, with each cohort consisting of males and females that were measured on one response variable. (Because of some exclusions, there are unequal sample sizes between groups.)

Prior to running the ANOVA, my understanding is that I must test the data for normality and homogeneity of variance (HOV).

Do I test for normality and HOV in each of the four groups separately? (i.e. test for normality in data from cohort 1 males only, then test for normality in data from cohort 1 females only, then cohort 2 males, then cohort 2 females?)
Does the assumption of HOV apply to all four groups, i.e. The null hypothesis is "Cohort 1 male variance = Cohort 1 female variance = Cohort 2 male variance = Cohort 2 female variance?"
I used the Shapiro-Wilk test for normality in each group, and Levene's test of equality of error variances. Unfortunately, in all groups, the data are very non-normal and give highly significant values for Levene's test. I have tried several transformations (square root, log, natural log, square) but nothing has worked to normalize the data so far.

I'm wondering how to proceed? I've read that unlike Welch's test for a one-way ANOVA, there is no good two-way ANOVA equivalent for non-normal data with heterogeneous variances.

Are there any other transformations that could work? If not, would the best option be to simply run the ANOVA, but mention that assumptions were violated that may impact the test results?

EDIT (to add more information):

To clarify, the main issue is lack of homogeneity of variance for the Two-Way ANOVA. I had previously written that the transformations did not work to normalize the data — I was mistaken (my apologies!). The data were very positively skewed (kurtosis was not really an issue), and the square root transformation successfully normalized the distribution. However, I still have heterogeneous variances. I'm wondering if there's anything I can do to correct this, or if it's acceptable to go ahead with the ANOVA, and explicitly mention the heterogeneous variances in the description of my methods?

EDIT 2 (images added):

Boxplots of untransformed data:

EDIT 3 (raw data added):

**Cohort 1 males (n=12)**: 
0.476
0.84
1.419
0.4295
0.083
2.9595
4.20125
1.6605
3.493
5.57225
0.076
3.4585

**Cohort 1 females (n=12)**: 
4.548333
4.591
3.138
2.699
6.622
6.8795
5.5925
1.6715
4.92775
6.68525
4.25775
8.677

**Cohort 2 males (n=11)**: 
7.9645
16.252
15.30175
8.66325
15.6935
16.214
4.056
8.316
17.95725
13.644
15.76475

**Cohort 2 females (n=11)**:
11.2865
22.22775
18.00466667
12.80925
16.15425
14.88133333
12.0895
16.5335
17.68925
15.00425
12.149

Best Answer

Thanks for posting the data. Posting shows that the box plots concealed, although not intentionally, the sample sizes and important detail too. Whenever I see skewness on a positive response, my first instinct is to reach for logarithms, as they so often work well. Here, however, logarithms drastically over-transform, and plotting everything shows up a small surprise, namely that the two lowest values need care and attention.

The graph here is a quantile-box plot in which the original data points are plotted in order on scales consistent with the box idea (i.e. about half the points are inside the box and about half outside, the "about" being a side-effect of sample sizes like 11).

A more cautious square root transformation seems about right.

Personally I regard preliminary tests for normality and so forth as over-rated stuff left over from the 1960s. I feel far too queasy about forking paths of the form: pass the test OK, fail the test do something quite different, particularly with small sample sizes. Once you have a scale on which you have approximate symmetry and approximate equality of variances, linear models will work well.

Similarly, skewness and kurtosis from small samples can hardly be trusted. (Actually, skewness and kurtosis from large samples can hardly be trusted.) For some of the reasons see e.g. this paper

Indeed, some fits with generalised linear models with cohort and gender as indicator predictor variables show that results seem consistent over identity, root and log links, even despite the evidence of the first graph. If this were my problem I would push forward with a square root link function. In other words, although transformations are informative about the best scale to work on, you let the link function of a generalised linear model do the work.

Campaign slogan: Conventional box plots with a few groups leave out detail that could easily be interesting or useful and don't make full use of the space available. Use graphs that show more!

EDIT:

Here is token output: predicted values using generalised linear model, root link, normal family, interaction between cohort and females:

  +--------------------------------------+
  | cohort   females   predicted   Freq. |
  |--------------------------------------|
  |      1     males       2.056      12 |
  |      1   females       5.024      12 |
  |      2     males      12.712      11 |
  |      2   females      15.348      11 |
  +--------------------------------------+

Related Solutions

Solved – How to run two-way ANOVA on data with neither normality nor equality of variance in R

This may be more of a comment than an answer, but it won't fit as a comment. We may be able to help you here, but this may take a few iterations; we need more information.

First, what is your response variable?

Second, note that the marginal distribution of your response does not have to be normal, rather the distribution conditional on the model (i.e., the residuals) should be--it is not clear that you have examined your residuals. Furthermore, normality is the least important assumption of a linear model (e.g., an ANOVA); the residuals may not need to be perfectly normal. Tests of normality are not generally worthwhile (see here for a discussion on CV), plots are much better. I would try a qq-plot of your residuals. In R this is done with qqnorm(), or try qqPlot() in the car package. It's also worth considering the manner in which the residuals are non-normal: skewness is more damaging than excess kurtosis, in particular if the skews alternate directions amongst the groups.

If there really is a problem worth worrying about, a transformation is a good strategy. Taking the log of your raw data is one option, but not the only one. Note that centering and standardizing aren't really transformations in this sense. You want to look into the Box & Cox family of power transformations. And remember, the result doesn't have to be perfectly normal, just good enough.

Next, I don't follow your use of the chi-squared test for homogeneity of variance, although it may be perfectly fine. I would suggest you use Levene's test (use leveneTest() in car). Heterogeneity is more damaging than non-normality, but the ANOVA is pretty robust if the heterogeneity is minor. A standard rule of thumb is that the largest group variance can be up to four times the smallest without posing strong problems. A good transformation should also address heterogeneity.

If these strategies are insufficient, I would probably explore robust regression before trying a non-parametric approach.

If you can edit your question and say more about your data, I may be able to update this to provide more specific information.

Solved – How to do an ANOVA when your data are non-normal with possibly differing variances

I notice that your response is called "Number of Organisms", and that all the values are non-negative integers. I suspect these are count data. They should not be treated as normally distributed and analyzed with a traditional ANOVA. Instead, a count GLM is appropriate. We can try Poisson regression:

anova(glm(N.Organisms~as.factor(Group), data=d, family=poisson), test="LRT")
# Analysis of Deviance Table
# Model: poisson, link: log
# Response: N.Organisms
# Terms added sequentially (first to last)
# 
# Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
# NULL                               723     619.31              
# as.factor(Group) 12   78.907       711     540.41 6.669e-12 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

aggregate(N.Organisms~Group, data=d, FUN=function(x){ c(mean=mean(x),variance=var(x)) })
# Group N.Organisms.mean N.Organisms.variance
#     1         4.476190             2.161905
#     2         4.902439             1.790244
#     3         4.523077             2.690865
#     4         4.585714             2.883851
#     5         4.260000             2.767755
#     6         4.387097             3.060814
#     7         3.610169             2.690240
#     8         3.717391             3.540580
#     9         3.692308             2.334842
#    10         3.290909             3.284175
#    11         3.058824             2.256471
#    12         3.064516             2.520360
#    13         3.022222             2.471411

Oddly, your data seem to be underdispersed. That is rather unusual, and I'm not sure what to make of it. For robustness, we can try a simple chi-squared test, which should be underpowered (because it treats the counts as categories), nonetheless, it is highly significant as well:

with(d, table(Group, N.Organisms))
#      N.Organisms
# Group  1  2  3  4  5  6  7  8
#     1   0  2  3  7  3  4  2  0
#     2   0  3  2  8 17  5  6  0
#     3   2  4 14 12 14 10  8  1
#     4   2  5 13 15 14  9 10  2
#     5   2  3 14 13  4  7  7  0
#     6   6  1 13 12 10 13  7  0
#     7   7  5 21  9  9  4  4  0
#     8   8  5 10  5  7  9  2  0
#     9   5  5 16  9 10  6  1  0
#     10 13  5 16  6  6  7  2  0
#     11 10  6 20  5  7  2  1  0
#     12 12 10 21  8  5  4  2  0
#     13 22  8 33  7 14  5  1  0

set.seed(1)  # makes the simulated p-value reproducible
chisq.test(with(d, table(Group, N.Organisms)), simulate.p.value=T)
#  Pearson's Chi-squared test with simulated p-value (based on 2000 replicates)
# 
# data:  with(d, table(Group, N.Organisms))
# X-squared = 172.02, df = NA, p-value = 0.0004998

Probably the best test would be the Kruskal-Wallis test, which is analogous to a one-way ANOVA for non-normal data. It won't be affected by the underdispersion, but will take into account the fact that $8$ organisms is $> 7$, and affords pairwise post-hoc comparisons easily via Dunn's test. (An implementation of Dunn's test for R was developed by our own @Alexis; for more on Dunn's test see here and here.) Note that the Kruskal-Wallis test does not assume that the variances are equal, although you won't be able to interpret a significant result as a simple shift in the medians if the shapes of the distributions differ.

kruskal.test(N.Organisms~as.factor(Group), data=d)
#  Kruskal-Wallis rank sum test
# 
# data:  N.Organisms by as.factor(Group)
# Kruskal-Wallis chi-squared = 99.948, df = 12,
# p-value = 5.699e-16

library(dunn.test)
with(d, dunn.test(N.Organisms, as.factor(Group), method="Holm", kw=FALSE))
#                       Comparison of N.Organisms by group                       
#                                     (Holm)                                     
# Col Mean-|
# Row Mean |          1          2          3          4          5          6
# ---------+------------------------------------------------------------------
#        2 |  -0.955248
#          |     1.0000
#          |
#        3 |  -0.012069   1.270114
#          |     0.4952     1.0000
#          |
#        4 |  -0.112368   1.161274  -0.144722
#          |     1.0000     1.0000     1.0000
#          |
#        5 |   0.586382   1.940375   0.826708   0.974481
#          |     1.0000     0.9420     1.0000     1.0000
#          |
#        6 |   0.216494   1.544994   0.324981   0.473740  -0.514632
#          |     1.0000     1.0000     1.0000     1.0000     1.0000
#          |
#        7 |   2.045019   3.816537   2.906720   3.098461   1.910109   2.556624
#          |     0.8171    0.0043*     0.0950     0.0515     0.9260     0.2537
#          |
#        8 |   1.678281   3.251401   2.309693   2.475995   1.417073   1.990415
#          |     1.0000     0.0316     0.4704     0.3122     1.0000     0.8844
#          |
#        9 |   1.755800   3.400923   2.456288   2.632395   1.522142   2.123496
#          |     1.0000    0.0191*     0.3229     0.2077     1.0000     0.7080
#          |
#       10 |   2.692297   4.589523   3.786058   3.987944   2.754012   3.433308
#          |     0.1774    0.0002*    0.0047*    0.0022*     0.1501    0.0176*
#          |
#       11 |   3.223104   5.206152   4.483634   4.691145   3.432921   4.131515
#          |     0.0342    0.0000*    0.0002*    0.0001*    0.0173*    0.0012*
#          |
#       12 |   3.321963   5.440195   4.741819   4.969661   3.610449   4.365576
#          |     0.0250    0.0000*    0.0001*    0.0000*    0.0093*    0.0004*
#          |
#       13 |   3.487721   5.846371   5.211207   5.479181   3.927490   4.789963
#          |    0.0146*    0.0000*    0.0000*    0.0000*    0.0027*    0.0001*
# 
# Col Mean-|
# Row Mean |          7          8          9         10         11         12
# ---------+------------------------------------------------------------------
#        8 |  -0.394796
#          |     1.0000
#          |
#        9 |  -0.345287   0.059170
#          |     1.0000     1.0000
#          |
#       10 |   0.912191   1.244372   1.223490
#          |     1.0000     1.0000     1.0000
#          |
#       11 |   1.652971   1.936170   1.936942   0.746270
#          |     1.0000     0.8984     0.9232     1.0000
#          |
#       12 |   1.754493   2.038829   2.046213   0.799659   0.016138
#          |     1.0000     0.8086     0.8351     1.0000     0.9871
#          |
#       13 |   1.943622   2.224783   2.246161   0.903324   0.054395   0.039280
#          |     0.9609     0.5611     0.5433     1.0000     1.0000     1.0000

Best Answer

Related Solutions

Solved – How to run two-way ANOVA on data with neither normality nor equality of variance in R

Solved – How to do an ANOVA when your data are non-normal with possibly differing variances

Related Question