[Math] The random incidence paradox of sampling problem

probability

There are 100 families: 10 families have no children, 40 families have 1 child each, 30 families have 2 children each, 10 families have 3 each, and 10 families have 4 each.

If you pick a family at random, the expected number of children in that family is $1.7$. If you pick a child at random, the expected number of children in that child's family is $2.41$.

Suppose that a fraction $p_k$ of the families have $k$ children each. Let $K$ be the number of children in a randomly selected family, and let $a=E[K]$ and $b=E[K^2]$. Let $W$ be the number of children in a randomly chosen child's family. Express $E[W]$ in terms of $a$ and $b$.

I got the answer $b/a$ by purely numeric calculation, but I really want to know how this answer is derived theoretically. Any hint or explanation will be appreciated.

Best Answer

Not a separate answer, but a follow-on comment that's a bit too large to fit into the comment box.

This is an instance of the so-called inspection paradox. An interesting consequence of the paradox is that we could construct probability distributions where the expected number of children in a given family is finite, but if we select a child at random, the expected number of siblings of that child is infinite.

For instance, suppose that the probability that a given family has $k$ children is given by

$$ p_k = \frac{1}{\zeta(3)} \left(\frac{1}{k^3}\right) $$

where $\zeta(3) = 1/1^3+1/2^3+1/3^3+\cdots \doteq 1.2021$ is Apéry's constant. Then the expected number of children in a family would be given by

$$ \sum_{k=0}^\infty kp_k = \frac{1}{\zeta(3)} \sum_{k=0}^\infty \frac{1}{k^2} = \frac{1}{\zeta(3)} \left(\frac{\pi^2}{6}\right) \doteq 1.3684 $$

which seems perfectly ordinary. However, if we go on to try to calculate the expected number of siblings for a randomly selected child, we run into an infinite second moment:

$$ \sum_{k=0}^\infty k^2p_k = \frac{1}{\zeta(3)} \sum_{k=0}^\infty \frac{1}{k} > \infty $$

Such are the travails of a heavy-tailed distribution.

Related Question