How is it possible that for two sample sets I'm getting a low p-value, but also a low AUC value (just below 0.5)?
To compute the P-value I'm looking at the second outputted value of the function here http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.mannwhitneyu.html
For the AUC I'm using the same function's first outputted value divided by the product of the sample sets lengths.
And here is a boxplot of the two series:
Best Answer
Just looking at the boxplot heavily indicated that @Scortchi is correct with his comment. The number of outliers alone indicate that you have a very high sample size, so a very high power to find differences. This means you have strong evidence for a very small discrimination, which is usually not of high interest (practically speaking).
Mann-Whitney p-values (using the normal approximation) vs AUC for some different sample sizes ($n_1,n_2$):