I'm not sure what the basis is for your colleague's claim -- but they should support the claims they make before you accept them as true -- there's an astonishing amount of misinformed folklore about. (How do they know that this is true? Do you have good reason to think it must be true in your case?)
Both tests assume$^\dagger$ continuous distributions and both are impacted by ties (however, it's relatively easy to deal with ties in the Mann-Whitney and some software will do so automatically).
--
$\dagger$ Edit: To support my claim of the assumption of continuity in respect of the Mann-Whitney (since whuber says I am wrong on this point, I had better justify it), I refer to the beginning of Mann and Whitney (1947):
1. Summary. Let $x$ and $y$ be two random variables with continuous cumulative distribution functions $f$ and $g$.
So for Mann and Whitney's version of the test, they do explicitly assume continuity - and not idly, since they do rely on it in their derivation. However, it's possible (as I mention later) to deal with ties in the Mann-Whitney by working out the distribution of the test statistic at the null under the pattern of ties, or by correctly computing the effect of ties on the variance of the statistic under the normal approximation (what's usually referred to as the 'adjustment for ties').
--
For both tests, if the effect of the ties are not properly dealt with, both kinds of error rate are impacted - their type I error rates are lowered, and lowering the significance level necessarily lowers power ($=1-\beta$).
It's not 100% clear to me which test might be the most impacted, nor under what circumstances, but offhand I'd have expected the greater sensitivity generally went with the KS test* - and this is even before one 'adjusts' the Mann-Whitney for ties (i.e. if you used the normal approximation and used the variance for the no ties case).
*(personally, I'd use simulation suited to the specific instance to see what the properties would be under the sorts of conditions you see, at those sample sizes.)
Below is an illustration of the impact on the distribution of p-values under identical population distributions with of a moderate level of ties$^\ddagger$ with sample sizes of 33 and 67 under the default settings in R (which for the Mann-Whitney uses the normal approximation with correct calculation of variance in the presence of ties for this sample size):
For the tests to work 'as advertized' under the null, these distributions should look close to uniform. As you see, the Mann-Whitney (at least when properly calculating the variance of the sum of the ranks under the presence of ties, as here) is indeed very close to uniform. Since (as we can see) for the Kolmogorov-Smirnov test the proportion of p-values below $\alpha$ will be much smaller than $\alpha$, the test is highly conservative, with corresponding effects on power. [If anything, the effect is somewhat stronger than I'd have anticipated.]
$\ddagger\,$(the impact on the variance of the test statistic is fairly small in percentage terms)
Further, if your interest lies in a location-shift alternative, the Mann-Whitney would have greater power against that alternative to start with, so even if it did lose more power as a result of the discreteness (which I doubt), it may still have more power afterward.
You don't say how heavily tied your data are, nor in what sort of pattern. If both tests are more impacted than you're prepared to accept, you can work with the permutation distribution of either test statistic for you data (or with the permutation distribution of some other statistic, including a difference in sample medians if you wish).
In spite of many books (especially in some particular areas of application) stating that it is, the Mann-Whitney is not actually a test for a difference in medians. However, if you additionally assume that the populations distributions are the same under the null, and restrict the alternative to a location-shift, then it's a test for difference in any reasonable location measure - population medians, population lower quartiles, even population means (if they exist).
Indeed, one needn't restrict oneself to location shift alternatives. Assuming identical distributions under the null against an alternative that will move medians (or any other measure of location) will work; so for example, it would work perfectly well that way as a test of medians under an assumption of scale-shift. We must keep in mind however, that the Mann-Whitney is a far more general test than that and that when we rely on an assumption to make it a test for medians or whatever, we do actually lean on our assumption for the conclusion to make it mean what we want it to.
In short, which test do I trust?
Don't simply trust what anyone says (including me!) - unless they have solid evidence (I haven't brought any that's directly relevant to your situation,, and none relating to power because I haven't seen your pattern of ties and I am not 100% sure whether you're only interested in location shifts).
What kind of data do you have (what are you measuring, how are you measuring it, and how do ties arise)? What are you interested in finding out? Why do you mention medians?
Use simulation to find out how any tests you contemplate behave in circumstances similar to yours, and decide for yourself whether there's a problem to worry about. For both tests, see what the impact of ties is on the test, both under the null and under alternatives you care about, and then the case of the Mann-Whitney, see the effect of the adjustment for ties, and compare it with dealing with the exact permutation distribution (or in large samples like yours, with the randomization distribution). For the KS you can look at the exact permutation distribution as well.
Best Answer
Answer to the question
First of all, it is important to notice that the quantities $P(X\leq Y)$ and $P(X\lt Y)$ are different, given that the variables are not continuous.
Let $X$ and $Y$ be two independent random variables whose distribution is a mixture of a discrete and a continous distribution such that $P(X=0)=p_1>0$ and $P(Y=0)=p_2>0$. Then by the law of total probability we have that
\begin{eqnarray*} P(X\leq Y)&=&P(X\leq Y \vert Y=0)P(Y=0)+P(X\leq Y \vert Y>0)P(Y>0)\\ &=& P(X=0)P(Y=0)+P(X\leq Y \vert Y>0)[1-P(Y=0)]\\ &=& p_1p_2 +P(\{X\leq Y\} \cap \{X=0 \cup X>0\} \vert Y>0)(1-p_2). \end{eqnarray*}
Now, $P(\{X\leq Y\} \cap \{X=0 \cup X>0\} \vert Y>0)=p_1+P(X\leq Y\vert X>0,Y>0)(1-p_1)$. With this, we obtain an expression for $\theta=P(X\leq Y)$ in terms of quantities that we can estimate. Note that
\begin{eqnarray*} P(X\leq Y\vert X>0,Y>0)=\int_0^{\infty}F_X(y)f_Y(y)dy, \end{eqnarray*}
where $F_X$ is the CDF of the continuos part $X$ and $f_Y$ is the PDF of the continuos part of $Y$ (In your case, a Lomax distribution).
Now, how to estimate the parameters? I am going to use nonlinear squares between the CDFs and the empirical CDFs. This method works in your case given the large sample size. Please, find below an R code for conducting this estimation using a simulated sample of size $n=1000$.
With this code we obtain estimators of the parameters $(p_1,\alpha_X,\lambda_X,p_2,\alpha_Y,\lambda_Y)$. The remaining step consists of calculating $P(X\leq Y\vert X>0,Y>0)=\int_0^{\infty}F_X(y)f_Y(y)dy$.
Similarly, the estimator of $P(X<Y)$ can be calculated as follows
Then the quantity $P(X\leq Y)$ depends on the probabilities $p_1$ and $p_2$ and therefore this quantity may be misleading. For instance, if $X$ and $Y$ are i.i.d. and $p_1,p_2\approx 1$, we have that $P(X\lt Y)\approx 0$ and $P(X \leq Y)\approx 1$.
My conclusion: The stress-strength coefficient is not what you need for comparing the performance of both companies.
How to solve the problem?
I think this problem can be seen as a decision problem. You have two companies providing a programming service and you want to decide which one is better. Consider the hypothetical case where one of the companies produce a large proportion of codes with zero errors but also that when a code contains errors, it is likely that the number of errors is large. Is this better than a company with a lower proportion of codes with zero errors but smaller positive errors?
A toy naive example. Suppose that your decision rule is based on the estimated proportions of 0-error codes as follows:
This (naive, I repeat) criterion favours companies that produce more 0-error codes and proceeds to select one based on the stress-stress coefficient of the continous part when they seem to produce a similar proportion of 0-error codes.
In order to conduct a proper analysis, one would need to select a proper loss-function based on expert's opinion in order to come up with a reasonable selection criterion. This would require more effort and I think it would fall out of the scope of this site but I hope this answer gives you some help.
Some references of possible interest:
It would also help to check the literature on Software quality control and see the critera adopted by some companies.