Wikipedia has misled you in stating "...if both x and y are given and paired is TRUE, a Wilcoxon signed rank test of the null that the distribution ... of x - y (in the paired two sample case) is symmetric about mu is performed."
The test determines whether the RANK-TRANSFORMED values of $z_i = x_i - y_i$ are symmetric around the median you specify in your null hypothesis (I assume you'd use zero). Skewness is not a problem, since the signed-rank test, like most nonparametric tests, is "distribution free." The price you pay for these tests is often reduced power, but it looks like you have a large enough sample to overcome that.
A "what the hell" alternative to the rank-sum test might be to try a simple transformation like $\ln(x_i)$ and $\ln(y_i)$ on the off chance that these measurements might roughly follow a lognormal distribution--so the logged values should look "bell curvish". Then you could use a t test and convince yourself (and your boss who only took Business Stats) that the rank-sum test is working. If this works, there's a bonus: the t test on means for lognormal data is a comparison of medians for the original, untransformed, measurements.
Me? I'd do both, and anything else I could cook up (likelihood ratio test on Poisson counts by firm size?). Hypothesis testing is all about determining whether evidence is convincing, and some folks take a heap of convincin'.
In scipy.stats, the Mann-Whitney U test compares two populations:
Computes the Mann-Whitney rank test on samples x and y.
but the Wilcoxon test compares two PAIRED populations:
The Wilcoxon signed-rank test tests the null hypothesis that two
related paired samples come from the same distribution. In particular,
it tests whether the distribution of the differences x - y is
symmetric about zero. It is a non-parametric version of the paired
T-test.
EDITED / CORRECTED in response to ttnphns' comments.
Note that the t does not test for whether the distribution of the differences is symmetric about zero, so the Wilcoxon signed rank test is not truly a non-parametric counterpart of the paired t test.
The Mann-Whitney test, on the other hand, assumes that all the observations are independent of each other (no basis for pairing here!). It also assumes that the two distributions are the same, and the alternative is that one is stochastically greater than the other. If we make the additional assumption that the only difference between the two distributions is their location, and the distributions are continuous, then "stochastically greater than" is equivalent to such statements as "the medians are different", so you can, with the extra assumption(s), interpret it that way.
The Mann-Whitney uses a continuity correction by default, but the Wilcoxon doesn't.
The Mann-Whitney handles ties using the midrank, but the Wilcoxon offers three options for handling ties in the paired values (i.e., zero difference between the two elements of the pair.)
It sounds like the Wilcoxon test is the more appropriate for your purposes, since you do have that lack of independence between all observations. However, one might imagine that requests with similar, but not equal, lengths might exhibit similar behavior, whereas the Wilcoxon would assume that if they aren't paired, they are independent. A logistic regression model might serve you better in this case.
Quotes are from the scipy.stats doc pages, which we aren't supposed to link to, apparently.
Best Answer
Consider a distribution of pair-differences that is somewhat heavier tailed than normal, but not especially "peaky"; then often the signed rank test will tend to be more powerful than the t-test, but also more powerful than the sign test.
For example, at the logistic distribution, the asymptotic relative efficiency of the signed rank test relative to the t-test is 1.097 so the signed rank test should be more powerful than the t (at least in larger samples), but the asymptotic relative efficiency of the sign test relative to the t-test is 0.822, so the sign test would be less powerful than the t (again, at least in larger samples).
As we move to heavier-tailed distributions (while still avoiding overly-peaky ones), the t will tend to perform relatively worse, while the sign-test should improve somewhat, and both sign and signed-rank will outperform the t in detecting small effects by substantial margins (i.e. will require much smaller sample sizes to detect an effect). There will be a large class of distributions for which the signed-rank test is the best of the three.
Here's one example -- the $t_3$ distribution. Power was simulated at n=100 for the three tests, for a 5% significance level. The power for the $t$ test is marked in black, that for the Wilcoxon signed rank in red and the sign test is marked in green. The sign test's available significance levels didn't include any especially near 5% so in that case a randomized test was used to get close to the right significance level. The x-axis is the $\delta$ parameter which represents the shift from the null case (the tests were all two-sided, so the actual power curve would be symmetric about 0).
As we see in the plot, the signed rank test has more power than the sign test, which in turn has more power than the t-test.