Probability of two randomly-selected people knowing each other given certain conditions

probability

How would one approach calculating the probability of two people knowing each other from a population of 1 million given that there is a sub-set of 2,200 people that know someone else in the following manner:

One of them knows 1 other person from that subset,
another one knows 2 people from that subset,
another one knows 3 people from that subset,
another one knows 4 people from that subset,
another one knows 5… and so on, until

one of them knows the remaining 2,199 people from that subset?

I've encountered similar problems, and I think it may be somewhat like the birthday paradox but given that the factorials and powers that I think are needed seem to be way too large to be calculated traditionally, I can't wrap my head around this one.

Any help would be greatly appreciated…

Best Answer

I'll assume that you intended the relation “$A$ knows $B$” to be symmetric, i.e. if $A$ knows $B$, then $B$ knows $A$.

Also, you didn't specify how the two people are chosen; I'll assume that a pair of people is uniformly randomly selected among all pairs of people (or equivalently, that two people are independently uniformly randomly selected without replacement).

I'll also assume that the only acquaintances there are are the ones you describe and the ones we can deduce from them (so most of the million people don't know anyone).

If so, the probability of choosing a pair of people is the number of pairs of people who know each other over the total number of pairs of people.

The total number of pairs of people is

$$ \binom{1000000}2=\frac{1000000\cdot999999}2=499999500000\;. $$

To determine the number of pairs of people who know each other, consider the subset of $2200$ people. You've specified how many people they know for $2199$ of them, and from that we can deduce how many people the remaining person knows. The person who knows all $2199$ must know the person who only knows $1$, and the person who only knows $1$ knows no one else, so that determines all acquaintances for those two people. If you subtract this $1$ acquaintance from everyone else's total, the remaining $2198$ people fulfill the same type of description: For each number from $1$ to $2197$, there's someone who knows that number of people. So we can continue in this manner, resolving the person with most and least acquaintances in each step.

If we do this for a subset of $n$ people, if $n$ is odd, the process will end after $\frac{n-1}2$ steps with the $1$ person whose number of acquaintances you didn't specify, and they will have been assigned $\frac{n-1}2$ acquaintances. If $n$ is even, the process will end after $\frac n2$ steps with no people, and the $1$ person whose number of acquaintances you didn't specify will have been assigned $\frac n2$ acquaintances. Thus, in either case there are

$$ \frac12\left(\sum_{k=1}^{n-1}k+\left\lfloor\frac n2\right\rfloor\right)=\frac12\left(\frac{n(n-1)}2+\left\lfloor\frac n2\right\rfloor\right)=\left\lfloor\frac{n^2}4\right\rfloor $$

acquaintances in the subset (where the factor $\frac12$ accounts for the fact that we counted each acquaintance twice, once from either side). In your case, with $n=2200$, this is

$$ \left\lfloor\frac{2200^2}4\right\rfloor=\frac{2200^2}4=1210000\;. $$

Thus, the probability of selecting two people who know each other is

$$ \frac{1210000}{499999500000}=\frac{11}{4545450}\approx2.42\cdot10^{-6}\;. $$