Solved – What statistics test to use to compare multiple groups with different sample sizes

anovacontinuous datahypothesis testingnonparametricsample-size

I'm running a test where I need to compare four groups on different dependent variables. Two of them are categorical and I'll a use Chi-squared test for the head-count while one y is a continuous variable: Reinvestment Value.
the four groups and statistics with respect to the continuous y are:

  • Control group: n=2749 I don't have the statistics yet
  • Treatment A n=624, mean=3413.9 s2=30269139.5 sd=5501.7, k=12, sk=3.1
    enter image description here
  • Treatment B n=38, mean=1546.62 s2=1710133.95 sd=1307.72, k=-0.81, sk=0.82
    enter image description here
  • Treatment C n=1708, mean=2528 s2=21949273.2 sd=4685, k=28, sk=4.5
    enter image description here

just to clear up all the confusion, I have the actual samples data not just the statistics.

So I wonder what test should I use.

I wanted to use ANOVA, to compare means of my four different groups, but:

  • data violate the assumption on normality, all the groups look more similar to an exponential distribution. However, one of the four, the one with smallest kurtosis, is an exponential with couple of picks
  • and also they violate the assumption on equal variances proved by both Barlette's and Levene's test

I have read that as long as the sample sizes are equal and big enough I can still use ANOVA, however

  • also my sample sizes are different and I have some of the groups with n>100 and others n<100. However, none of those has less than 30 observation, so I am not sure if I can still assume the central limit theorem.
  • also I have two groups with kurtosis > 5 and one of them ~0. Again I read that if kurtosis < 5 we can still assume normality

So I thought to move to a non-parametric test, like Kruskal-Wallis, but also in that case I have seen that they result inefficient if the homogeneity of variances is violated.

So I guess I have multiple questions here:

  1. How many n should I have to assume the central limit theorem? I read
    controversial answers on this, some say >30, others >50 and others>100. could I still assume the data normal with 50<n<100?
  2. It looks like the homogeneous variances assumption is even more important than the assumption about normality, so what to do if data do violate this assumption? Shall I transform them? Is there a better non-parametric one to use, like the Welch test?
  3. If I should transform them, how, what to use?
  4. Can I randomly sample the biggest groups to get an equal n from all the groups and apply ANOVA?
  5. If I have had homogeneous variances, all the other conditions stay the same, should I use a parametric ANOVA or non-parametric Kruskal–Wallis?
  6. I will also need a post test to assess where the differences are, in case of ANOVA I was thinking to use Tukey, but what to use in case of Kruskal–Wallis?

Just to give context, I'm in the situation where at least 2 means, variances and standard deviation look very different so reasonably they are. Also for information, I use RStudio to run the analysis.

Hope someone can help with this,

Best Answer

First off, ANOVA is often surprisingly robust to strange distributions. What is important is not that model residuals are normally distributed, but that parameters are (approximately) normally distributed. Here, that would be the group means. You can take a look at this by bootstrapping each group mean. I suspect that the bootstrapped means will be nicely bell-shaped, giving you a little more confidence. (The only group I am slightly concerned about is group B with only $n=38$.)

That said, you could run a permutation alternative to ANOVA. In R, there is RVAideMemoire::perm.anova. A useful textbook is Good's Permutation, Parametric, and Bootstrap Tests of Hypotheses. If you do so, be sure to also run the plain vanilla ANOVA alongside and compare the resulting $p$ values - you may well find that they are very close together. (And note that $p$-values that are astronomically small do not have much more content than "$p<.0001$".)