Solved – Does an urn’s probability distribution change as you draw from it without replacement on average

discrete datadistributionsprobability

Suppose I have an urn containing N different colours of balls and each different colour can appear a different number of times (if there are 10 red balls there need not also be 10 blue balls). If we know the exact contents of the urn before drawing we can form a discrete probability distribution which tells us the probability of drawing each colour of ball. What I am wondering is how the distribution changes after drawing k balls without replacement from the urn on average. I understand that as we draw from the urn we can update the distribution with the knowledge of what has been taken out, but what I want to know is what we would expect the shape of the distribution to be after we have removed k balls. Does the distribution change on average or does it remain the same? If it does not remain the same can we write down some formula for the what we expect the new distribution to look like on average after making k draws?

Best Answer

  1. "Direct calculation": Let there be $n$ balls of $m$ colours in the urn. Let us focus on the probability of drawing one particular colour, say white, on the second draw. Let the number of white balls be $n_w$. Let $X_i$ be the colour of the ball obtained at the $i$-th draw.

    \begin{eqnarray} P(X_2=W)&=&P(X_2=W|X_1=W)P(X_1=W)+P(X_2=W|X_1=\overline{W})P(X_1=\overline{W})\\ &=&\frac{n_w-1}{n-1}\frac{n_w}{n}+\frac{n_w}{n-1}\frac{n-n_w}{n}\\ &=&\frac{n_w(n-n_w+n_w-1)}{n(n-1)}\\ &=&\frac{n_w}{n}\\ &=&P(X_1=W) \end{eqnarray}

    Of course this same argument applies to any colour on the second draw. We can apply the same kind of argument recursively when considering later draws.

    [One could of course perform an even more direct calculation. Consider the first $k$ draws as consisting of $i$ white balls and $k-i$ non-white balls (with probability given by the hypergeometric distribution), and perform the corresponding calculation to the simple one above but for the draw at step $k+1$; one gets a similar simplification and cancellation, but it's not especially enlightening to carry out.]

  2. A shorter argument: consider labelling the balls randomly with the numbers $1,2,...,n$, and then drawing them out in labelled order. The question now becomes "Is the probability that a given label, $k$, is placed on a white ball the same as the probability the label $1$ gets placed on a white ball?"

    Now we see the answer must be "yes" by symmetry of the labels. Similarly, by symmetry of the ball-colours, it doesn't matter that we said "white", so the argument that label $k$ and label $1$ have the same probability applies to any colour. Hence the distribution at the $k$-th draw is the same as for the first draw, as long as we have no additional information from the earlier draws (i.e. as long as the earlier drawn balls are not seen).

Related Question