Solved – What test should I use for two groups of different number of measurements and low number of samples

t-testwilcoxon-mann-whitney-test

I have data that include 20 samples divided into 2 groups (category A and category B). The groups are independent, none of the value in one group repeat in other. N(A) =14, N(B) = 6.

here is the data:

category A                  category B
0.0119888167559  0.023185483871
0.00101354303189     0.312090168227
8.95231103909e-06    0.503371693147
2.9580256165e-05    0.522824974411
0.0596266691309  0.114932864532
4.02612958098e-05   3.32126606662e-05
0.337753287524  
0.0115114590662 
0.19273480545   
0.232453117898  
3.69713102632e-05   
3.00480769231e-05   
0.192851577717  
1.58790650407e-05

I would like to show that mean values of 2 groups differ significantly. But I am very confused which test statistic I should use.

Here are the tests that I've performed so far:

  1. Wilcoxon rank sum test (Mann-Whitney test) (two-tailed)
    W=20, p=0.07575

  2. Student t-test (two-tailed)
    t = -2,24259 p = 0,03775

  3. Welch t-test (two-tailed, unpaired, correction=False)
    t=-1.7109, p = 0.1376

So as you see 3 tests present 3 different probabilities…to be more complicated …

  1. Normality test (Shapiro-Wilk)
    I've checked also the normality of my data, and the first group category A is normally distributed (Test Shapiro-Wilka = 0,704713, p 0,000413591, p<0.05) but second is not-category B (Test Shapiro-Wilka = 0,868539, p 0,220442, p>0.05) probably because of low number of samples.

A list of questions:

Q1: Can I assume that my data in 2 groups are normally distributed and use Student t test or Welch t-test?
Q2: OR Should I use non-parametric Mann Whitney test? (I've written that it has low power for low number of samples…)
Q3: Another think is the equality of variation between groups, when I assume that there are equal I can use Student t- test, if not I can use Welch t-test…should I first perform test for variant equality?

To summarize post – I need help to find a test that will be OK to publicate results in publication for:
– small number of samples in one group (less than 10)
– unequal number of samples in groups
– data not normally distributed in one group
– showing the difference of means (optional)

I would really appreciate for any suggestions,
Please help!

Best Answer

Your interpretation focuses mainly on the p-values (in particular, whether they fall above or below a critical threshold for "significance")*, so the most important thing is to make sure the p-value for your test is reflecting what you think it is. In other words, you want the p-value to tell you the probability of seeing a difference between groups that is this dramatic, assuming that in fact there was no real difference between groups (the null hypothesis). The calculation of a p-value from data via a test relies on assumptions, so if those assumptions aren't met the p-value might be bogus. Here's where you are at with the tests you suggest:

  1. Wilcoxon rank sum test (a.k.a. Mann-Whitney U): Assumes your data are independent observations (as do all of these tests). Makes no assumptions about the size of the groups, variances of the groups, or the shapes of the distributions (e.g. normality). You're fine here --- the p-value you get for this test is probably an accurate reflection of the probability of getting two distributions this different assuming the null hypothesis is true.
  2. Student t-test (also some times called just an "independent samples t-test"): Assumes your data are independent observations. Also assumes the variances in the two groups are equal, and that each group is normally distributed. Although it's actually pretty robust to violations in the assumption of normality (so the p-values usually end up being pretty accurate even when your data aren't normally distributed), the fact that you have very unequal sample sizes compounds the potential issue of unequal variances, making that a much larger problem. Your p-value here may not really reflect what you think it does! It is NOT a good idea to test for equality of variances and then pick whether to use a t-test or Welch approximation t-test based on that --- rather, you should avoid Student's t-test completely when your sample sizes are so different.
  3. Welch t-test: Assumes your data are independent observations. Does NOT assume variances are equal, so it's fine if they aren't (and it's fine with different sample sizes). It does still assume the data are normally distributed, but as long as your sample size is large enough a violation of the assumption of normality doesn't affect p-values very much.

So using the Student's t-test would be a bad idea, but either the nonparametric Mann-Whitney U test or the Welch approximation t-test would be fine. You're right that the Mann-Whitney U is generally less powerful than a parametric test --- that's because it discards a lot of information in your data by switching from using the scores themselves to using the ranks of the scores (i.e. it only cares about which scores are higher than which other scores, not how much higher they are).

A fourth option would be to use a resampling procedure called a permutation test (see this question and this question for a description and instructions). It makes no assumptions about normal distributions or equal variances, and works well for small sample sizes.

In sum, you could use the non-parametric test, the Welch test, or a permutation test, but not the Student's t-test.

I recommend the Welch t-test in most situations like this because it's simple to run and interpret (the interpretetion of the nonparametric test can be a little tricky since it's testing the difference in ranks, as I mentioned), and it's appropriate for unequal sample sizes. The nonparametric test and the permutation test are safer, though, because they make no assumptions about normally distributed data, so if you feel comfortable running and interpreting those you could do that instead.

To learn more about how your choice of t-test affects the empirical type 1 error rate for the t-test vs. Welch approximation t-test: http://daniellakens.blogspot.com/2015/01/always-use-welchs-t-test-instead-of.html

  • Note that p-values (which can be tricky to interpret even under the best circumstances) don't matter for some other approaches to statistical analysis. Bayesian tests provide the probability of a hypothesis (or a probability distribution over a range of possible values), rather than providing the probability of observing your data given the null hypothesis. If you're relying on frequentist significance tests, though, it's vital that you understand what needs to be true about your data and your testing procedure in order for your p-values to be accurate.
Related Question