Solved – How to measure the “confidence” that one set of values is greater than another

confidence intervaldistributionspaired-comparisonsprobability

I have a relatively simple task. I have collected several hundred samples from several groups (group here is arbitrary and not relevant to my question). These samples are scored on a continuous interval [0,100]. I know that on average the samples that I collect from some groups are larger than others. I would like to somehow express my "confidence" that one group has larger scores than another. I will illustrate this with an example.

Suppose I am weighing a group of ten oranges and ten apples and find that the weights of my samples are

$$apples = [1,2,3,1,2,3,1,2,3,1]$$
$$oranges = [2,3,4,2,3,4,2,3,4,2]$$

On average the oranges weigh more than the apples in my example. However, not every orange weighs more than all of the apples. How might I express the idea, and confidence of it, that oranges weigh more than apples? Notice that I am not making any assumptions about the distribution of weight for either apples or oranges; they could in fact have very different distributions (i.e. one group may have a normal distribution while another is bimodal). I also do not care about the probability that an orange weighs more than $x$ pounds than an apple; only that oranges are heavier. Is the right way to look at this to find the probability that an orange weights more than an apple, or is it sufficient to say that oranges are heavier on average than apples (saying this worries me since I'm not assuming they have the same distribution)?

Edit:
The number of samples I have collected for each group is not equal

Best Answer

Given two sets $A, B$, the average weights have nothing to do with the probability $p$ that $a \in A$ is greater than $b\in B$. $p$ could be almost 100% despite $E(A) < E(B)$ given one huge outlier.

The only ways to do this are to either calculate the true probability by looking at all pairs (like Tusell's suggestion, but all pairs not just non-overlapping pairs) or making some assumption about the distributions of $A, B$ and then estimating hyperparameters. If you assume normality, then the estimated difference is (noncentral, scaled) student's T distributed hence the T test suggested by Mihir.

Related Question