Hannah, first you answered your own question. Because the data is ordinal, you have to use the M-W test. If not for this condition, you probably could have used both tests. In many hypothesis testing situations, it is not 100% clear what test to use (you test for Normality and result is inconclusive; or you have a very large sample whereby Normality is less critical of an issue). In such undetermined situations (this happens very often), I recommend you use both tests (t test, M-W). You will notice that very often both tests pretty much give you very similar results in terms of statistical significance level. So, when both tests concur on this topic your case is solved. You can state with confidence whether you accept or reject the null hypothesis. Much fewer cases will be anbiguous when the two tests reach opposite conclusion. In this case, you drill down and look at your data very closely and make a conservative assumption as to given the nature of the data what test may be more appropriate. Sometimes, it is still unclear. In such a situation, you may decide to go with the test that gives you the more conservative result or you may decide to go with the test that reduces the type of error you are more sensitive to (Type I vs Type II error).
But, in your case because of the ordinal data, the proper test is M-W. Just for fun, you could try using the t-test and see if the two tests would come to the same conclusion... knowing that in this case, you would have to select M-W.
You might be looking for a different answer here, but my opinion is that you should be using Benjamini-Hochberg (potentially using the $q$-value Storey et. al 2002 modification). I'm going to try to show you why.
Benjamini-Hochberg (aka. false discovery rate control (FDR)) controls robustly under certain kinds of dependency, e.g. positive regression dependency. What is regression dependency?
From Some Concepts of Dependence Section 5. Regression Dependence, it is shown for two sets $X$ and $Y$ that regression dependence can be written as
$$ P(Y \leq y| X \leq x) \geq P(Y \leq y) $$
meaning that the knowledge of $X$ being small increases the probability of $Y$ being small. This is intuitive; if you eat more food generally you will gain more weight. This is extended slightly to
$$ P(Y \leq y | X = x) $$
is non decreasing in $x$. The Benjamini-Hochberg correction is dependant upon two sets, the set of true null statistics ($I_0$, set of tests which really are not true) and the joint set of test statistics $D$, which is the joint set of true nulls and real associations. In THE CONTROL OF THE FALSE DISCOVERY RATE IN
MULTIPLE TESTING UNDER DEPENDENCY we can look and see that PRDS (positive regression dependence) is then defined as
Property PRDS For any increasing set $D$, and for each $i \in I_0$, $P(X \in D | X_i = x)$ is non-decreasing in $x$.
This intuitively means that for any increasing (ordered) joint statistic distribution, the probability that a test is part of the set of joint (i.e. null plus real, iff there is one real association) is non decreasing as you look at higher $P$-values.
It is also noted that
Therefore, whenever the joint distribution of the test statistics is
$PRDS$ on some $I_0$ so is the joint distribution of the corresponding
$p$-values, be they right-tailed or left-tailed
If we look at the Mann-Whitney $U$ specifically, we find that for large samples, $U$ is $\sim$ normally distributed. In the case of simple dependencies, your $U$ value will be inflated if it is correlated with a true positive association, so the set of test statistics will be larger. It's intuitive to see that as you as have larger $U$ statistics, you have a greater probability of having a real association.
That was kind of long and not really what you were looking for probably, but let's regroup. It's my argument that even though you have a dependency structure, it's probably okay to be using FDR (and unless we have your actual data, which you probably shouldn't give us, then it's best to assume). This is because your data probably follows a condition known as PRDS which is required by FDR correction. You can implement FDR in almost any software package.
If I were you, instead of a list of "significant" or "non-significant" associations, a list of adjusted $q$-values, as detailed by John D. Storey in his paper A direct approach to false disovery rates. In a nut-shell, a $q$-value is the false discovery rate that would be required for your $P$-value to be on the cusp of significance. Perhaps you could present this in tandem with your Bonferroni adjusted $P$-values, because for some reason they make people comfortable.
This is just my opinion, and I would love to hear rebuttals to any part of this. All the best, and thanks for a good question.
Best Answer
You cannot correct for gender in Mann-Whitney. Mann-Whitney is a location test for two groups, and that's all.
There are at least two options here: 1) Stratify by gender. That is, analyze the men and women separately.
2) Do some sort of regression, perhaps OLS or, given your use of Mann-Whitney, perhaps quantile regression, with "concentration" as the dependent variable and two independent variables: Gender and group (patient vs. control)