From Hollander & Wolfe pp 106-7,
Let $F$ be the distribution function corresponding to population 1 and
$G$ be the distribution function corresponding to population 2. The
null hypothesis is: $H_O: F(t)=G(t)$ for every $t$. The null
hypothesis asserts that the $X$ variable and the $Y$ variable have the
same probability distribution, but the common distribution is not
specified.
Strictly speaking this describes the Wilcoxon test, but $U=W-\frac{n(n+1)}{2}$, so they're equivalent.
The Mann-Whitney test is a special case of a permutation test (the distribution under the null is derived by looking at all the possible permutations of the data) and permutation tests have the null as identical distributions, so that is technically correct.
One way of thinking of the Mann-Whitney test statistic is a measure of the number of times a randomly chosen value from one group exceeds a randomly chosen value from the other group. So the P(X>Y)=0.5 also makes sense and this is technically a property of the equal distributions null (assuming continuous distributions where the probability of a tie is 0). If the 2 distributions are the same then the probability of X being Greater than Y is 0.5 since they are both drawn from the same distribution.
The stated case of 2 distributions having the same mean but widely different variances matches with the 2nd null hypothesis, but not the 1st of identical distributions. We can do some simulation to see what happens with the p-values in this case (in theory they should be uniformly distributed):
> out <- replicate( 100000, wilcox.test( rnorm(25, 0, 2), rnorm(25,0,10) )$p.value )
> hist(out)
> mean(out < 0.05)
[1] 0.07991
> prop.test( sum(out<0.05), length(out), p=0.05 )
1-sample proportions test with continuity correction
data: sum(out < 0.05) out of length(out), null probability 0.05
X-squared = 1882.756, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.05
95 percent confidence interval:
0.07824054 0.08161183
sample estimates:
p
0.07991
So clearly this is rejecting more often than it should and the null hypothesis is false (this matches equality of distributions, but not prob=0.5).
Thinking in terms of probability of X > Y also runs into some interesting problems if you ever compare populations that are based on Efron's Dice.
Best Answer
The Mann-Whitney doesn't require equal variances unless you're specifically looking for location-shift alternatives.
In particular, it is able to test whether the probability of values in the first group are larger than the values in the second group, which is quite a general alternative that sounds like it's related to your original question.
Not only can the Mann-Whitney deal with transformed-location shifts very well (e.g. a scale-shift is a location-shift in the logs), it has power against any alternative that makes $P(X>Y)$ differ from $\frac{1}{2}$.
The Mann-Whitney U-statistic counts the number of times a value in one sample exceeds a value in the other. That's a scaled estimate of the probability that a random value from one population exceeds the other.
There's more detail here.
Also see the discussion here.
As for which is better, well, that really depends on a number of things. If the data are even a little more heavy-tailed than normal, you may be better with the Mann-Whitney, but it depends on the situation - discreteness and skewness can both complicate that situation, and it also depends on the precise alternatives of interest.