Finding the expected number of collisions.

expected valueprobability

A total of $r$ keys are to be put, one at a time, in $k$ boxes, with each key independently being put in box $i$ with probability $p_i$, where $\sum_{i=1}^k p_i =1 $. Each time a key is put in a nonempty box, we say that a collision occurs. Find the expected number of collisions.

The 2nd answer by @drhab in this link here (what's after the edit) conflicts with my understanding of the problem.
Expected number of collisions.
Specifically, I dont understand how $X_i = Y_i-1+ \mathbb{I}_{Y_i=0}$ represents the collision in a box, which would be $\max(0,Y_i-1)$. For $Y_i$ being the number of keys in the box.

Originally I thought the collision in box $i$ is $X_i=(E(key|i)-1) \mathbb{I}(key|i>0)$. I think what's wrong with this logic is that $E(key|i)$ takes $-1$ into the mean, but I'd also like to confirm that this deduction is correct.

Best Answer

Addendum-2 added to respond to the comment of drhab


I am not really qualified to critique your work. However, it seems to me that there is an alternative approach, that seemed to have (also) been overlooked by the article that you referenced.

You can focus on the expected number of collisions in each box. By linearity of expectation, it is irrelevant that the number of collisions in each box are not independent events.

With $p_i$ equal to the chance that a key went into box $(i)$, let $q_i = 1 - p_i$. Let $E(i)$ denote the expected number of collisions in box $(i)$. Then, the overall expected number of collisions is

$$\sum_{i=1}^{k} E(i).$$

So, the problem reduces to computing $E(i)$.

The number of possible keys that went into box $i$ is some element in $a \in \{0,1,2,\cdots,r\}.$ So, if $g(a) = ~$ the probability that $a$ keys went into box $(i)$, then

$$E(i) = \sum_{a=2}^r (a-1)g(a).$$

However, $\displaystyle g(a) = \binom{r}{a} p_i^a q_i^{(r-a)}.$

Therefore the final computation is

$$\sum_{i = 1}^k \left[ ~\sum_{a=2}^r (a-1)\binom{r}{a}p_i^a q_i^{(r-a)} ~\right].$$


Addendum
I originally went off the rails. I am leaving my original wrong answer in the Addendum, for a reference.

The reason that the answer below is wrong is that if (for example) there are already $2$ keys in a box, and a $3$rd key is added to that box, that counts as $1$ extra collision rather than $2$ extra collisions.


Instead of focusing on the boxes, focus on the keys.

There are $\binom{r}{2}$ possible key-collisions. By linearity of expectation, it is irrelevant that the $\binom{r}{2}$ events are not independent.

The chance of any one of those key-collisions occurring equals the sum of the probabilities that two keys went into the same box.

Therefore, the expected number of collisions is

$$\binom{r}{2} \times \left[\sum_{i=1}^k \left(p_i\right)^2\right].$$


Addendum-2
Response to the comment of drhab.

$E(i) = rp_i - 1 + (1 - p_i)r.$

I was able to confirm this. Analysis below.

Given that $r \in \Bbb{Z_{\geq 2}}.$

Let $p$ denote $p_i$.

Let $\displaystyle S = (1 - p)^r + \binom{r}{1}p - 1.$

Let $\displaystyle T = \sum_{a=2}^r (a-1) \binom{r}{a} p^a (1-p){(r-a)}.$

To Prove: $S = T$.


$\underline{\text{Preliminary Results}}$

$$ \sum_{a=0}^n \binom{n}{a} (-1)^a = 0. \tag{R-1} $$

Proof:

Binomial expansion on $\displaystyle ~ 0^n = [1 + (-1)]^n$.


$$ \sum_{a=1}^{n-1} \binom{n}{a} (-1)^a = (-1) + (-1)^{(n-1)}. \tag{R-2}$$

Proof:

Using (R-1),

$~\displaystyle \sum_{a=1}^{n-1} \binom{n}{a} (-1)^a = 0 - \left[(-1)^0 \binom{n}{0}\right] - \left[(-1)^n \binom{n}{n}\right] = (-1) - (-1)^n.$


$$ \sum_{a=2}^{n} \binom{n}{a} (-1)^a = (n-1). \tag{R-3}$$

Proof:

Using (R-1),

$~\displaystyle \sum_{a=2}^{n} \binom{n}{a} (-1)^a = 0 - \left[(-1)^0 \binom{n}{0}\right] - \left[(-1)^1 \binom{n}{1}\right] = (-1) + n.$


$$ \sum_{a=2}^{n} \binom{n}{a} (-1)^{(n-a)} = (-1)^n (n-1). \tag{R-4}$$

Proof:

$\displaystyle (-1)^{(n-a)} = (-1)^n \times (-1)^{(-a)} = (-1)^n \times (-1)^a.$

$~\displaystyle \sum_{a=2}^{n} \binom{n}{a} (-1)^{(n-a)} = (-1)^n \times \sum_{a=2}^{n} \binom{n}{a} (-1)^a.$

Using (R-3), this equals $~\displaystyle (-1)^n \times (n-1).$


$$ \sum_{a=2}^{n} \binom{n-1}{a-1} (-1)^{(n-a)} = (-1)^n. \tag{R-5}$$

Proof:

$\displaystyle (-1)^{(n-a)} = (-1)^n \times (-1)^{(-a)} = (-1)^n \times (-1)^a.$

$~\displaystyle \sum_{a=2}^{n} \binom{n-1}{a-1} (-1)^{(n-a)} = (-1)^n \times \sum_{a=2}^{n} \binom{n-1}{a-1} (-1)^a.$

$\displaystyle = (-1)^n \times \sum_{a=1}^{n-1} \binom{n-1}{a} (-1)^{a+1}$

$\displaystyle = (-1)^{(n+1)} \times \sum_{a=1}^{n-1} \binom{n-1}{a} (-1)^a$.

$\displaystyle = (-1)^{(n+1)} \times \left\{ ~\left[ \sum_{a = 0}^{(n-1)} \binom{n-1}{a} (-1)^a \right] - \binom{n-1}{0}(-1)^0 ~\right\}$

$\displaystyle = -1^{(n+1)} \times \{0 - 1\} = (-1)^n.$


$$ (1 - p)^n = \sum_{a=0}^{n} \binom{n}{a} (-1)^a p^a. \tag{R-6}$$

Proof:

Immediate, by binomial expansion.


$$ \text{For} ~r \in \Bbb{Z_{\geq 2}}, ~j \in \{2,3,\cdots, r\}, ~a \in \{2,3,\cdots, j\}, $$

$$~\binom{r}{a} \times \binom{r-a}{J-a} = \binom{r}{J} \times \binom{J}{a}. \tag{R-7} $$

Proof:

$\displaystyle \frac{r!}{a! ~(r-a)!} \times \frac{(r-a)!}{(J-a)! ~(r-J)!} ~=~ \frac{r!}{a! ~(J-a)! ~(r-J)!} $

$\displaystyle = \frac{r!}{J! ~(r-J)!} \times \frac{J!}{a! ~(J-a)!}.$


$$~a \times \binom{r}{a} \times \binom{r-a}{J-a} = J \times \binom{r}{J} \times \binom{J-1}{a-1}. \tag{R-8} $$

Proof:

$\displaystyle a \times \frac{r!}{a! ~(r-a)!} \times \frac{(r-a)!}{(J-a)! ~(r-J)!} ~=~ \frac{r!}{(a-1)! ~(J-a)! ~(r-J)!} $

$\displaystyle = \frac{r!}{J! ~(r-J)!} \times \frac{J!}{(a-1)! ~(J-a)!} = \binom{r}{J} \times J \times \binom{J-1}{a-1}.$


$$ \sum_{a=2}^r \binom{r}{a} p^a (1-p)^{(r-a)} = \sum_{J=2}^r (J-1) \times \binom{r}{J} ~(-1)^J ~p^J. \tag{R-9} $$

Proof:

Each term in $\displaystyle ~\sum_{a=2}^r \binom{r}{a} p^a (1-p)^{(r-a)}~$ will have a factor of
$~\displaystyle p^J ~$ in it, where $~J~$ is some element in $~\{2, 3, \cdots, r\}.$

Therefore $\displaystyle ~\sum_{a=2}^r \binom{r}{a} p^a (1-p)^{(r-a)}~$ can be expressed as

$\displaystyle \sum_{J=2}^r p^J \times f(J),~$ where $~f(J)~$ needs to be computed.

Using (R-6), for $~J \in \{2,3,\cdots,r\},~$ and $~a \in \{2,3,\cdots,J\}$,
the $~\displaystyle p^J~$ term in $~\displaystyle \binom{r}{a} p^a (1-p)^{(r-a)}$

is given by

$\displaystyle \binom{r}{a} p^a \times \binom{r-a}{J-a} p^{(J-a)} (-1)^{(J-a)}.$

Using (R-7), this equals

$\displaystyle p^J \times \binom{r}{J} \times \binom{J}{a} (-1)^{(J-a)}.$

Therefore,

$\displaystyle \sum_{J=2}^r p^J \times f(J) = \sum_{J=2}^r \left\{ ~\binom{r}{J} \times p^J \times \left[\sum_{a=2}^J \binom{J}{a} (-1)^{(J-a)}\right] ~\right\}.$

Using (R-4), this equals

$\displaystyle \sum_{J=2}^r \binom{r}{J} \times p^J \times (J-1) \times (-1)^J.$


$$ \sum_{a=2}^r a \times \binom{r}{a} p^a (1-p)^{(r-a)} = \sum_{J=2}^r J \times \binom{r}{J} ~(-1)^J ~p^J. \tag{R-10} $$

Proof:

Analysis very similar to (R-9).

$\displaystyle ~\sum_{a=2}^r a \times \binom{r}{a} p^a (1-p)^{(r-a)}~$ will equal

$\displaystyle \sum_{J=2}^r p^J \times g(J),~$ where $~g(J)~$ needs to be computed.

Using (R-6), for $~J \in \{2,3,\cdots,r\},~$ and $~a \in \{2,3,\cdots,J\}$,
the $~\displaystyle p^J~$ term in $~ \displaystyle a \times \binom{r}{a} p^a (1-p)^{(r-a)}$

is given by

$\displaystyle a \times \binom{r}{a} p^a \times \binom{r-a}{J-a} p^{(J-a)} (-1)^{(J-a)}.$

Using (R-8), this equals

$\displaystyle p^J \times J \times \binom{r}{J} \times \binom{J-1}{a-1} (-1)^{(J-a)}.$

Therefore,

$\displaystyle \sum_{J=2}^r p^J \times g(J) = \sum_{J=2}^r \left\{ ~J \times \binom{r}{J} \times p^J \times \left[\sum_{a=2}^J \binom{J-1}{a-1} (-1)^{(J-a)}\right] ~\right\}.$

Using (R-5), this equals

$\displaystyle \sum_{J=2}^r J \times \binom{r}{J} \times p^J \times (-1)^J.$




Let $~\displaystyle S = rp - 1 + (1-p)^r.$

Let $~\displaystyle T = \sum_{a=2}^r (a-1) \times \binom{r}{a} \times p^a \times (1-p)^{(r-a)}.$

To Prove: $~ S = T$.

Using (R-6),

$\displaystyle S = \left[\sum_{i=0}^r \binom{r}{i} ~(-1)^i ~p^i\right] - \left[\sum_{i=0}^1 \binom{r}{i} ~(-1)^i ~p^i\right]$

$\displaystyle S = \sum_{i=2}^r \binom{r}{i} ~(-1)^i ~p^i.$

Let $~\displaystyle T_1 = \sum_{a=2}^r a \times \binom{r}{a} \times p^a \times (1-p)^{(r-a)}.$

Let $~\displaystyle T_2 = \sum_{a=2}^r \binom{r}{a} \times p^a \times (1-p)^{(r-a)}.$

Then $T = T_1 - T_2.$

By (R-10),

$\displaystyle T_1 = \sum_{J=2}^r p^J \times \binom{r}{J} \times (-1)^J \times (J).$

By (R-9),

$\displaystyle T_2 = \sum_{J=2}^r p^J \times \binom{r}{J} \times (-1)^J \times (J-1).$

Therefore,

$\displaystyle T_1 - T_2 = \sum_{J=2}^r p^J \times \binom{r}{J} \times (-1)^J = S.$