Welch’s t test vs Mann-Whitney for very unequal groups

biostatisticshypothesis testingnonparametrict-testwilcoxon-mann-whitney-test

I have two vastly unequal groups (n1=18 and n2=400), and want to test mean differences between the groups for some 700 parameters (more specifically, RNA expression levels for 700 genes).

While I understand that Welch's t works well for non-normally distributed samples if both groups are large enough, this is not the case here. Testing for normality 700*2 times (or examining QQ plots) would take a lot and would almost certainly show that some variables are normally distributed and some are not.

In this case, would you go for

  • Welch t-tests for everything (which would probably be breaking assumptions),
  • Mann-Whitney tests for everything (which have a lower power and might lead to not getting too many significant results), or
  • choosing the appropriate test for each variable, according to normality testing? This would probably not be a good idea, since I have to adjust the p-values using the FDR method, and I don't think that is correct if the p-values are generated by completely different tests

Best Answer

Your idea:

Mann-Whitney tests for everything (which have a lower power and might lead to not getting too many significant results)

is potentially misleading you in 2 ways.

First, as jbowman says on another page:

the worst that the Mann-Whitney can ever perform relative to the t-test is about 0.864 in terms of asymptotic relative efficiency, i.e., it would require 1/0.864x as much data to give the same power (asymptotically.)... The point of course being that you can't shoot yourself in the foot by using the Mann-Whitney test instead of the t-test, but the converse is not true.

Second, at the usual "significance" standard of p < 0.05 and 700 RNA species to examine, you will find about 35 "significant" differences even if there aren't any true differences. You will need to deal with the associated multiple comparisons problem, typically done via false discovery rate control with this type of study.

It's not a good idea to decide what type of test to perform based on a preliminary test of your results. This page has extensive discussion related to your type of situation.