Mann-Whitney, and many other nonparametric tests, assumes stochastic ordering of the samples (or more for other tests). Essentially, this means that the two distributions do not cross when they are different; one cumulative distribution is above the other or to the left of the other.
If two sample are normally distributed, then the only way they can be stochastically ordered is if their variances are equal. The same is true for other symmetric, two parameter distributions such as the logistic. So for these kinds of distributions, equal variances are required for a properly interpretable test using rank-based tests, such as the Mann-Whitney.
Several distributions have different variances because their means are different, such as the exponential. In many cases, as the mean increases the variance must also increase. Many of these are still appropriate for rank tests such as the Mann-Whitney.
First, don't worry about it so much. Go ahead and do the rank test for small samples. For large samples such as yours I doubt that a test of equal means is very informative. Nonparametric tests really shine where the samples are small and it is difficult to use methods that depend on estimating parameters, fitting distributions, etc.
Second, examine some graphs that show the cumulative distributions of the samples you are comparing. If they cross over once then they are likely not stochastically ordered. If they don't cross, except maybe at the low end, they they are likely stochastically ordered. Do the rank test you like. If they cross back and forth then they are more likely stochastically ordered. Do the rank test you like but it is unlikely to be significant.
Try fitting some distributions to the large sample, then estimate the corresponding parameters for the small sample. See what they look like and describe them. (This is useful.)
If you have observations that are bounded at zero, eg, times or counts, then you are likely to have a distribution where the variance and the mean are functionally related.
The following material added in response to a Comment
Let me first note that I am not responsible for Wikipedia. I often find it useful, though information from it should be taken cautiously.
There are two articles there, as shown below. I don't know which one of those, or another one, you referred to. The first article uses the term "stochastic ordering" the second uses the term "stochastic dominance." They mean the same thing.
Under the null hypothesis the Mann-Whitney assumes that the two distributions are the same. (No parameters need to be mentioned here.) The alternative that is tested for is that one distribution is stochastically greater than the other. You might not regard this as an assumption of the test. But if the two distributions are not stochastically the same or stochastically ordered then the alpha level or p-value of the test is not correct. (I regard this as an assumption of the test, though others might differ on what that expression means.)
As to the normal or gaussian distribution, if the variances differ then the assumptions of the Mann-Whitney test do not hold. Other distributions with different variances are suitable for the Mann-Whitney if they are stochastically ordered.
This term was used by Mann and Whitney in their article in 1947, according to Wikipedia.
Mann–Whitney U test
https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test
A thorough analysis of the statistic, which included a recurrence allowing the computation of tail probabilities for arbitrary sample sizes and tables for sample sizes of eight or less appeared in the article by Henry Mann and his student Donald Ransom Whitney in 1947.[1] This article discussed alternative hypotheses, including a stochastic ordering (where the cumulative distribution functions satisfied the pointwise inequality FX(t) < FY(t)). This paper also computed the first four moments and established the limiting normality of the statistic under the null hypothesis, so establishing that it is asymptotically distribution-free.
Talk: Mann-Whitney_U_test
https://en.wikipedia.org/wiki/Talk:Mann%E2%80%93Whitney_U_test#What_do_you_need_to_assume_under_the_null_hypothesis.3F
Really, this test is precisely only for testing stochastic dominance of two variables A and B, that is, of Prob(A>B) > Prob(B>A). In other words, it tests whether a randomly chosen sample from A is expected to be greater than a sample from B.
Best Answer
Not all tests of variance homogeneity across groups are equal: the Brown–Forsythe test would probably be better than Levene's test given your dependent variable's distribution. It sounds like your outcome is a zero-inflated count variable.
I'm thinking the ideal choice is a zero-inflated negative binomial or quasi-Poisson regression with the experimental group as your reference group for dummy coding purposes (i.e., three dummy variables for your control groups), but there may be more robust options for coping with heteroskedasticity. When all assumptions are true, ANOVA works, but generalized linear models and nonparametric estimators are better for non-normal error distributions. Weighted least squares can help with heteroskedastic groups, but requires a lot of data. Diagonally weighted least squares is somewhat more forgiving. Zero-inflated models also require more power though – see the following references. The second discusses iteratively weighted least squares and compares negative binomial and quasi-Poisson regression.
References
· Williamson, J. M., Lin, H., Lyles, R. H., & Hightower, A. W. (2007). Power calculations for ZIP and ZINB models. Journal of Data Science, 5(4), 519–534. Retrieved from http://www.jds-online.com/file_download/150/JDS-360.pdf.
· Ver Hoef, J. M., & Boveng, P. L. (2007). Quasi-Poisson vs. negative binomial regression: How should we model overdispersed count data? Ecology, 88(11), 2766–2772. Retrieved from http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1141&context=usdeptcommercepub.