Solved – What statistics test to use to compare multiple groups with different sample sizes

anovacontinuous datahypothesis testingnonparametricsample-size

I'm running a test where I need to compare four groups on different dependent variables. Two of them are categorical and I'll a use Chi-squared test for the head-count while one y is a continuous variable: Reinvestment Value.
the four groups and statistics with respect to the continuous y are:

Control group: n=2749 I don't have the statistics yet
Treatment A n=624, mean=3413.9 s2=30269139.5 sd=5501.7, k=12, sk=3.1
Treatment B n=38, mean=1546.62 s2=1710133.95 sd=1307.72, k=-0.81, sk=0.82
Treatment C n=1708, mean=2528 s2=21949273.2 sd=4685, k=28, sk=4.5

just to clear up all the confusion, I have the actual samples data not just the statistics.

So I wonder what test should I use.

I wanted to use ANOVA, to compare means of my four different groups, but:

data violate the assumption on normality, all the groups look more similar to an exponential distribution. However, one of the four, the one with smallest kurtosis, is an exponential with couple of picks
and also they violate the assumption on equal variances proved by both Barlette's and Levene's test

I have read that as long as the sample sizes are equal and big enough I can still use ANOVA, however

also my sample sizes are different and I have some of the groups with n>100 and others n<100. However, none of those has less than 30 observation, so I am not sure if I can still assume the central limit theorem.
also I have two groups with kurtosis > 5 and one of them ~0. Again I read that if kurtosis < 5 we can still assume normality

So I thought to move to a non-parametric test, like Kruskal-Wallis, but also in that case I have seen that they result inefficient if the homogeneity of variances is violated.

So I guess I have multiple questions here:

How many n should I have to assume the central limit theorem? I read
controversial answers on this, some say >30, others >50 and others>100. could I still assume the data normal with 50<n<100?
It looks like the homogeneous variances assumption is even more important than the assumption about normality, so what to do if data do violate this assumption? Shall I transform them? Is there a better non-parametric one to use, like the Welch test?
If I should transform them, how, what to use?
Can I randomly sample the biggest groups to get an equal n from all the groups and apply ANOVA?
If I have had homogeneous variances, all the other conditions stay the same, should I use a parametric ANOVA or non-parametric Kruskal–Wallis?
I will also need a post test to assess where the differences are, in case of ANOVA I was thinking to use Tukey, but what to use in case of Kruskal–Wallis?

Just to give context, I'm in the situation where at least 2 means, variances and standard deviation look very different so reasonably they are. Also for information, I use RStudio to run the analysis.

Hope someone can help with this,

Best Answer

First off, ANOVA is often surprisingly robust to strange distributions. What is important is not that model residuals are normally distributed, but that parameters are (approximately) normally distributed. Here, that would be the group means. You can take a look at this by bootstrapping each group mean. I suspect that the bootstrapped means will be nicely bell-shaped, giving you a little more confidence. (The only group I am slightly concerned about is group B with only $n=38$.)

That said, you could run a permutation alternative to ANOVA. In R, there is RVAideMemoire::perm.anova. A useful textbook is Good's Permutation, Parametric, and Bootstrap Tests of Hypotheses. If you do so, be sure to also run the plain vanilla ANOVA alongside and compare the resulting $p$ values - you may well find that they are very close together. (And note that $p$-values that are astronomically small do not have much more content than "$p<.0001$".)

Related Solutions

Solved – Significant Result in Levene’s Test

None of this follows ineluctably from the evidence you give.

I ran Levene's test on my data and got a p-value of 0.000, meaning that variances are very heterogeneous.

Possibly so, possibly not. The result is highly significant, but that may just mean that you have a large enough sample size to allow firm rejection of the null. It could be that the difference in variances is not fatal to ANOVA.

 I transformed the data but no method can make them homogeneous.

Possibly so, possibly not. We can't tell without looking at the data and hearing what you tried. Perhaps you missed out a transformation that would help. (I've seen people try transformations that make their problem worse; that need not be you, but you don't give enough detail for us to be sure.)

So that's it, ANOVA would be inappropriate to use. I was thinking that my data could be nonparametric so Kruskal-Wallis would be the best test to use. However, when I tried testing the data for Levene's test for nonparametric data, I still got a significant result.

Data are not parametric or non-parametric, just techniques. That's a misuse of terminology. See notably @Glen_b's answer here More crucially, I don't know what Levene's test for nonparametric data means. What makes you think that Kruskal-Wallis requires any such prior test?

I'd recommending that you back up and show us your data, or at least informative graphs, and tell us what interests you about them.

Solved – Should I use Kruskal-Wallis or Anova

If your sample size is large, non-normality will be significant even if distribution is similar (but not equal) to normal. On the other hand, what can cause problems in ANOVA is departure from normal, not significance of that departure. Then, we need to measure that departure.

The usual measure is to check skewness and kurtosis. If skewness is small and kurtosis is not very different from that of normal distribution we can assume that distribution is nearly normal for most practical purposes. Furthermore, ANOVA is quite robust about the normality assumption and results are not expected to change a lot due to small departure from that assumption (and the same could be said about the assumption of equal variances). To asses how big is departure from normality a rule of thumb given by Statgraphics in-program help (sorry, I can't find any other reference) is the interval -2, +2 for standardised kurtosis and skewness.

Anyway, if distribution is actually far from normal, then you can use a non-parametric test like Kuskal-Wallis.

Update about equal variances

About the assumption of equal variances, it can be said the same: it doesn't matter much we can be sure that variances are not exactly the same, what matters if how different variances are. From your graphics I would say that variation of your residuals don't look very different, so you aren't very far from homoscedasticity. If you compute variances for each group, a rule of thumb is that ANOVA results are still valid while the biggest variance is no more than ten times the smaller one (again no references, I just heard it from a more experienced professor).

Update about statistical significance vs practical significance

Your distributions are nearly normal and there is an small (maybe tiny) departure from normality. If your sample were small, no test could detect such small departure from normality, but with a large sample tests can detect that your distributions are not exactly normal. That little difference is real (hence the little p-value) but it is too small to matter for practical purposes like performing ANOVA.

I suggest reading about statistical significance vs practical significance. You can Google it or just go to here or here.

Best Answer

Related Solutions

Solved – Significant Result in Levene’s Test

Solved – Should I use Kruskal-Wallis or Anova

Related Question