Probability that $k$ elements are common

combinatoricsprobability

Let $U = \lbrace 1, 2, \dotsc, K \rbrace$. Now we take all possible subsets of $U$ except the null set and arrange them in $K$ tiers based on the number of elements in them. For example, for $U = \lbrace 1, 2, 3 \rbrace$, we have $3$ tiers as follows:

$T_1: \lbrace 1 \rbrace, \lbrace 2 \rbrace, \lbrace 3 \rbrace$

$T_2: \lbrace 1, 2 \rbrace, \lbrace 1, 3 \rbrace, \lbrace 2, 3 \rbrace$

$T_3: \lbrace 1, 2, 3 \rbrace$,

where index $i$ ($1 \leq i \leq K$) of $T_i$ denotes the number of elements of the sets that are in $T_i$.

Now, for a given test set in $T_i$, if we randomly choose a set in $T_j$ ($1 \leq i \leq K$), what is the probability that there are exactly $k$ elements common between the randomly chosen set in $T_j$ and the test set in $T_i$?

For example, for the test set $\lbrace 1, 2 \rbrace \in T_2$, $\lbrace 1 \rbrace$ and $\lbrace 2 \rbrace$ in $T_1$ have one element in common with $\lbrace 1, 2 \rbrace$. This means that the probability that there is one element common between $\lbrace 1, 2 \rbrace$ and the set chosen randomly from $T_1$ is $2/3$. Similarly, the set $\lbrace 1, 2, 3 \rbrace$ in $T_3$ has two elements common with $\lbrace 1, 2 \rbrace$, and the probability that there are exactly two elements common between $\lbrace 1, 2 \rbrace$ and the set chosen randomly from $T_3$ is $1$.

I am aware of the hypergeometric distribution which compares the sets of the same length. But, in this problem, I wish to find the common elements between sets of different lengths.

Best Answer

Let us first count how many sets in $T_j$ share exactly $k$ elements with a test set in $T_i$. There are $\binom{i}k$ ways to choose the shared members from the test set, then $\binom{K-i}{j-k}$ ways to choose the remaining members. Therefore, there are $\binom{i}k\binom{K-i}{j-k}$ possibilities for the set in $T_j$. We then divide by the total number of sets in $T_j$, which is $\binom{K}j$, to get the probability that exactly $k$ members are shared. The answer is $$ \frac{\binom{i}k\binom{K-i}{j-k}}{\binom{K}j} $$ This is exactly the hypergeometric distribution; it does apply when the sets are different lengths.

Related Question