There have been a number of papers which examine this issue. Most of them come to the conclusion that Welch's version of the t-test can be safely used in most circumstances.
The only situation in which the test seems to have undesirable performance is in very small sample sizes.
Here are some quotes from two papers which examine t-test performance with small sample sizes:
The t-test with the unequal variances option (i.e.,
the Welch test) was generally not preferred either. Only
in the case of unequal variances combined with unequal
sample sizes, where the small sample was drawn from
the small variance population, did this approach
provide a power advantage compared to the regular ttest.
In the other cases, a substantial amount of
statistical power was lost compared to the regular t-test.
The power loss of the Welch test can be explained by
its lower degrees of freedom determined from the
Welch-Satterthwaite equation.$^1$
Results suggest that the Welch t test is indeed
inflated, according to Bradley's (1978) fairly
stringent criterion, when sample sizes are
unequal – even when assumptions for the t test
are met in the population. The inflation rate
seems to be dependent more on the size of the
smaller group than on the total sample size, but
sample size ratio does seem to play a small
role$^2$
If you read through those papers though, you'll see that it's really only in the specific case with very small sample sizes (in particular, when the smaller of the two groups is very small) that it's much of an issue. "Small" meaning the effects are really only troublesome when a group contains around 5 subjects or less as posited by both papers, but take a closer look at the references for a more thorough discussion. In that case, you might (obviously) suggest collecting more data. But this can of course be an issue with prohibitively expensive experiments.
Otherwise Welch's is probably fine.
$^1$ : Using the Student’s t-test with extremely small sample sizes, J.C.F. de Winter 2013
$^2$ : Type I Error Inflation of the Separate-Variances
Welch t test with Very Small Sample Sizes when
Assumptions Are Met, Albert K. Adusah and Gordon P. Brooks 2011
Best Answer
(to long for a comment, so I guess it's an answer)
I'm not sure what makes you assert there's a substantive difference between the two cases. When you use Mann-Whitney for testing location-shift alternatives, the assumption is of identical distributions aside from the possible location shift. It's not actually necessary to assume identical distributions. The Mann-Whitney, is, for example, perfectly appropriate for testing scale shift alternatives, or a host of other alternatives, as long as you can compute the distribution of the test statistic under the null. If your rank-based anova is to have a distribution you can compute under $H_0$, you'll need at least some assumptions for the null case there also.
If your assumptions for both are the same (such as both being applied to shift alternatives) and you compute the null distribution on an ANOVA for 2 groups of ranks correctly, your p-values will be identical to the equivalent two-tailed Mann-Whitney, in the same way that $t^2 = F$ for an ordinary 2 group ANOVA compared to a two-tailed two-sample-t (the version with equal-variance).
It's somewhat of a tricky question, because if they're different shapes 'location difference' doesn't have an obvious meaning in the way it does when they're the same shape.
If you define some measure of location difference (like difference in means or difference in medians or median of pairwise differences or difference in minimum or whatever) then you can do something with it - e.g. try to compute a resampling based distribution, like a bootstrap distribution. It's important to be clear about what you are prepared to assume though.
A Mann-Whitney can be used for more general alternatives than a simple location shift. e.g. For continuous distributions, you can write the null in the form:
$P(X>Y) = \frac{1}{2}$
and the alternative as
$P(X>Y) \neq \frac{1}{2}\quad$ (for a two tailed test)
or
$P(X>Y) < \frac{1}{2}\quad$ (or "$>$", in either case as a one tailed test)
If I recall correctly, Conover's Practical Nonparametric Statistics presents them this way, for example.