(to long for a comment, so I guess it's an answer)
I'm not sure what makes you assert there's a substantive difference between the two cases. When you use Mann-Whitney for testing location-shift alternatives, the assumption is of identical distributions aside from the possible location shift. It's not actually necessary to assume identical distributions. The Mann-Whitney, is, for example, perfectly appropriate for testing scale shift alternatives, or a host of other alternatives, as long as you can compute the distribution of the test statistic under the null. If your rank-based anova is to have a distribution you can compute under $H_0$, you'll need at least some assumptions for the null case there also.
If your assumptions for both are the same (such as both being applied to shift alternatives) and you compute the null distribution on an ANOVA for 2 groups of ranks correctly, your p-values will be identical to the equivalent two-tailed Mann-Whitney, in the same way that $t^2 = F$ for an ordinary 2 group ANOVA compared to a two-tailed two-sample-t (the version with equal-variance).
if I had two groups and both had different, non-normal distributions, but I only wanted to test for a difference in location what test would be preferable? I was under the impression I could use a t-test on ranks, or a Welch t-test on ranks. However, if these tests are the similar to a M_W U test then I guess this is not the case.
It's somewhat of a tricky question, because if they're different shapes 'location difference' doesn't have an obvious meaning in the way it does when they're the same shape.
If you define some measure of location difference (like difference in means or difference in medians or median of pairwise differences or difference in minimum or whatever) then you can do something with it - e.g. try to compute a resampling based distribution, like a bootstrap distribution. It's important to be clear about what you are prepared to assume though.
A Mann-Whitney can be used for more general alternatives than a simple location shift. e.g. For continuous distributions, you can write the null in the form:
$P(X>Y) = \frac{1}{2}$
and the alternative as
$P(X>Y) \neq \frac{1}{2}\quad$ (for a two tailed test)
or
$P(X>Y) < \frac{1}{2}\quad$ (or "$>$", in either case as a one tailed test)
If I recall correctly, Conover's Practical Nonparametric Statistics presents them this way, for example.
In scipy.stats, the Mann-Whitney U test compares two populations:
Computes the Mann-Whitney rank test on samples x and y.
but the Wilcoxon test compares two PAIRED populations:
The Wilcoxon signed-rank test tests the null hypothesis that two
related paired samples come from the same distribution. In particular,
it tests whether the distribution of the differences x - y is
symmetric about zero. It is a non-parametric version of the paired
T-test.
EDITED / CORRECTED in response to ttnphns' comments.
Note that the t does not test for whether the distribution of the differences is symmetric about zero, so the Wilcoxon signed rank test is not truly a non-parametric counterpart of the paired t test.
The Mann-Whitney test, on the other hand, assumes that all the observations are independent of each other (no basis for pairing here!). It also assumes that the two distributions are the same, and the alternative is that one is stochastically greater than the other. If we make the additional assumption that the only difference between the two distributions is their location, and the distributions are continuous, then "stochastically greater than" is equivalent to such statements as "the medians are different", so you can, with the extra assumption(s), interpret it that way.
The Mann-Whitney uses a continuity correction by default, but the Wilcoxon doesn't.
The Mann-Whitney handles ties using the midrank, but the Wilcoxon offers three options for handling ties in the paired values (i.e., zero difference between the two elements of the pair.)
It sounds like the Wilcoxon test is the more appropriate for your purposes, since you do have that lack of independence between all observations. However, one might imagine that requests with similar, but not equal, lengths might exhibit similar behavior, whereas the Wilcoxon would assume that if they aren't paired, they are independent. A logistic regression model might serve you better in this case.
Quotes are from the scipy.stats doc pages, which we aren't supposed to link to, apparently.
Best Answer
If you want to compare Likert item data from two groups with methods that I believe no one will object to, you have a couple of options.
One is ordinal regression, which is very flexible for experimental design, and is relatively easy in some software packages.
Another is the Cochran-Armitage test. The traditional form can compare only two groups, but some implementations can handle more than two groups.
You will find different opinions on using traditional nonparametric tests like Wilcoxon–Mann–Whitney (WMW) on Likert item data.
From what I can gather, the common objections to using WMW with Likert item data are a) the test has an assumption of a continuous dependent variable, and b) the test may not behave well when there are many ties in the data (as would be case for Likert data.)
From what I can gather, the common defenses for using WMW with Likert item data is that a) the test is fine handling ordinal data, and b) the test accounts for ties, at least in modern implementations. I have also heard the argument that Likert item data represents a latent continuous variable, and so doesn't violate the continuity assumption.
I'm not a statistician, so I won't attempt to evaluate these arguments.
In my experience, the traditional nonparametric tests are generally well-behaved with Likert item data. At the bottom of the page, there are simulations here comparing WMW and Kruskal–Wallis to ordinal regression.
I also think that the hypothesis that WMW tests, that of stochastic equality, makes sense in many situations with Likert item data.
As a final note, I think the advice of @DavidSmith --- using a chi-square test of association for Likert item data --- is usually not a good approach. The problem with this approach is that it discards the information about the ordinal nature of the data, and tests a hypothesis, I think, that is not generally what the analyst is interested in.