[Math] Prove that the Jaccard Similarity of bags is always less than or equal to 1/2

data miningprobabilityproof-writing

Prove that the Jaccard Similarity of bags is always less than or equal to 1/2

I know that the Jaccard Similarity of bags is $= \dfrac{|B\cap C|}{|B\cup C|}$ and if the bags are the same then the similarity $=1$

A bag is a set of elements with order unimportant and repetition allowed.

For example if we have bags $B=\{A,B,B,C,C,C,C,D,D\}$ and $C=\{A,A,B,C,C,D,D,D\}$ then the Jaccard Similarity is $=\dfrac{1+1+2+2}{9+8}=\dfrac{6}{17}$

So we count an element n times in the intersection if n is the minimum number of times the element appears in B and C and the size of the union is the sum of the size of the bags.

But how would I prove it is always $\leq \dfrac{1}{2}$

Best Answer

Without loss of generality we can consider bags with only one kind of element. Let $b=|B|, c=|C|$. Then the formula gives $\min(b,c)/(b+c)$ but then $2\min(b,c)\leq(b+c)$. Note that the formula gives $1/2$ if the bags are equal.

Related Question