Suppose I have an urn containing N different colours of balls and each different colour can appear a different number of times (if there are 10 red balls there need not also be 10 blue balls). If we know the exact contents of the urn before drawing we can form a discrete probability distribution which tells us the probability of drawing each colour of ball. What I am wondering is how the distribution changes after drawing k balls without replacement from the urn on average. I understand that as we draw from the urn we can update the distribution with the knowledge of what has been taken out, but what I want to know is what we would expect the shape of the distribution to be after we have removed k balls. Does the distribution change on average or does it remain the same? If it does not remain the same can we write down some formula for the what we expect the new distribution to look like on average after making k draws?
Solved – Does an urn’s probability distribution change as you draw from it without replacement on average
discrete datadistributionsprobability
Related Solutions
I am going to interpret the question as follows: Find the expected number of white balls drawn before a black one is seen among $j$ draws without replacement, where if no black ball is seen, then the observation takes the value $j$. If this is not what you intend, please advise and I will revise accordingly.
We can use a nice, handy trick to find the expected value.
First, let's set out some notation. Let $X$ be the number of white balls seen before the first black ball is drawn in a sample of size $n$ taken without replacement from $n = w + b$ balls. Obviously $X \in \{0,1,\ldots,w\}$ with probability 1. Now, define $\newcommand{\Zj}{Z^{(j)}}\Zj$ to be the number of white balls seen before the first black ball from the first $j$ draws, or $j$ otherwise. Hence $\Zj = \min(X,j)$. We seek $\newcommand{\E}{\mathbb E}\renewcommand{\Pr}{\mathbb P}\E \Zj$. We need only consider $j \leq w$, since $\E \Zj = \E X$ for $j \geq w$.
Fact: If $Y$ is a nonnegative integer-valued random variable, then $$\E Y = \sum_{k=1}^\infty \Pr(Y \geq k) \>.$$
The proof is relatively easy and is omitted. It is a special case of the more general result that if $Y \geq 0$ almost surely, then $\E Y = \int_0^\infty \Pr(Y > y) \,\mathrm d y$.
Now, back to business. The key is to recognize the following equivalence of events, valid for $k \in \{0,\ldots,j\}$: $$ \{ \Zj \geq k\} = \{ X \geq k\} \>. $$
Hence we have $$ \E \Zj = \sum_{k=1}^j \Pr(\Zj \geq k) = \sum_{k=1}^j \Pr(X \geq k) = \sum_{k=1}^j \frac{{w \choose k}}{{n \choose k}} \>, $$ where the last equality follows from the fact that $X \geq k$ if and only if the first $k$ balls drawn are white.
Some binomial-coefficient manipulations yield $$ \E \Zj = \sum_{k=1}^j \frac{{w \choose k}}{{n \choose k}} = \frac{w}{n-w+1}\left(1 - \frac{{{w-1} \choose j}}{{n \choose j}} \right) \>. $$
Post scriptum: In case the last equality looks at all mysterious, we can give a short sketch of the proof. Let $S_k = \Pr(X \geq k)$ and $p_k = \Pr(X = k) = S_k - S_{k+1}$. Then, $$ \mu = \E\Zj = \sum_{k=1}^j S_k = \sum_{k=1}^{j-1} k p_k + j S_j \>. $$ Now, note that $k p_k = w S_k - n S_{k+1}$. All that is left to do is to sum over $k$, rearrange, and solve for $\mu$.
I would say that the closest to your description is the multivariate hypergeometric distribution
$$ f(k_1,\dots,k_c) = \frac{\prod_{i=1}^c {K_i \choose k_i}}{ N\choose n} $$
where you sample $n$ marbles appearing in $c$ colors, where you have $K_i$ marbles of each color, so you can sample no more then $N = \sum_i K_i$ marbles in total.
Since you seem to be asking about the total number of successes (as in Poisson-binomial distribution in your example), then I understand that you count $c-1$ colors as "successes" and the $c$-th color as failure. To obtain distribution of total number of "successes" in such case you can notice that marginally multivariate hypergeometric distribution follows a hypergeometric distribution, so $k_\text{tot} = \sum_{i=1}^{c-1} k_i$ follows a hypergeometric distribution where you sample $n$ marbles from the urn containing $K_\text{tot} = \sum_{i=1}^{c-1} K_i $, since it is the same as subtracting $k_c$ from $n$, where $k_c$ itself is drawn from univariate hypergeometric distribution.
Saying this in plain English: simply imagine that you painted all the $1,\dots,c-1$ colorful marbles in black leaving the $c$-th marble untouched, so now you are drawing from the urn containing only two types of marbles (black vs $c$-th) and you do not need to care about all the kinds of marbles you started with.
So the case with finite number of marbles and drawing without replacement is simple. On another hand, if you are dealing with infinite numbers of marbles, then sampling with replacement does not differ from sampling without replacement, so you are dealing with Poisson-binomial distribution.
Notice that initial probabilities of drawing the balls are $p_i = K_i/N$, so if you are dealing with finite number of balls, then knowing the initial probabilities is the same as knowing the counts.
Best Answer
"Direct calculation": Let there be $n$ balls of $m$ colours in the urn. Let us focus on the probability of drawing one particular colour, say white, on the second draw. Let the number of white balls be $n_w$. Let $X_i$ be the colour of the ball obtained at the $i$-th draw.
\begin{eqnarray} P(X_2=W)&=&P(X_2=W|X_1=W)P(X_1=W)+P(X_2=W|X_1=\overline{W})P(X_1=\overline{W})\\ &=&\frac{n_w-1}{n-1}\frac{n_w}{n}+\frac{n_w}{n-1}\frac{n-n_w}{n}\\ &=&\frac{n_w(n-n_w+n_w-1)}{n(n-1)}\\ &=&\frac{n_w}{n}\\ &=&P(X_1=W) \end{eqnarray}
Of course this same argument applies to any colour on the second draw. We can apply the same kind of argument recursively when considering later draws.
[One could of course perform an even more direct calculation. Consider the first $k$ draws as consisting of $i$ white balls and $k-i$ non-white balls (with probability given by the hypergeometric distribution), and perform the corresponding calculation to the simple one above but for the draw at step $k+1$; one gets a similar simplification and cancellation, but it's not especially enlightening to carry out.]
A shorter argument: consider labelling the balls randomly with the numbers $1,2,...,n$, and then drawing them out in labelled order. The question now becomes "Is the probability that a given label, $k$, is placed on a white ball the same as the probability the label $1$ gets placed on a white ball?"
Now we see the answer must be "yes" by symmetry of the labels. Similarly, by symmetry of the ball-colours, it doesn't matter that we said "white", so the argument that label $k$ and label $1$ have the same probability applies to any colour. Hence the distribution at the $k$-th draw is the same as for the first draw, as long as we have no additional information from the earlier draws (i.e. as long as the earlier drawn balls are not seen).