[Math] Expected number of red balls removed from an urn before the first black ball

polya-urn-modelprobability

Question: An urn contains n+m balls of which n are red and m are black. They are withdrawn from the urn one at a time and without replacement. Let $X$ be the number of red balls removed before the first black ball is chosen. We are interested in determining the $E[X]$. To obtain this quantity, number the red balls from 1 to n. Define the random variables $X_i$ $(i=1,2…,n)$ by

$$X_i =
\begin{cases}
1 \quad & : \text{if red ball labeled (i) is taken before any black ball is chosen}\\
0 \quad & : \text{otherwise} \end{cases}$$

Express $X$ in terms of $X_i$s and find $E[X]$.

My Attempt:

Now we can express $X$ as $X = X_1+X_2+…+X_n$

To find the $X_i's$, what I think is it not geometric, but it is more like this
$$
\begin{array}{c|c|c|c}
X & 1 & 2 & \ldots \\
\hline
P(X) & \frac{m}{n+m} & (\frac{n}{n+m})(\frac{m}{(n+m)-1}) & \ldots
\end{array}
$$
Now to determine the random variables, I tried to think about this step-by step. For $X_1$, it means that is the red ball labeled 1 is picked before any black ball is chosen, and so on for $ X_2, X_3,..,X_N$. What I think is to determine the random variables, it is like a permutation going on since the balls are numbered, but it's not. There is only 1 ball in the urn which is red and is labeled #1. There is only 1 ball which is red and is labeled #2, etc.

The question is that am I thinking about this the right way? Again, I really apologize about the organization. It is my first time on this site and I am trying to get familiar with the programming. Thank you for all of your help!

Best Answer

Think of all the balls as distinguishable, and imagine taking out the balls one at a time until they are all gone. Then all sequences of balls are equally likely. Temporarily, let us call Red Ball $i$ and all the black balls special. Each of the $m+1$ special balls is equally likely to be the first special ball drawn. It follows that the probability Red Ball $i$ is drawn before any black is $\frac{1}{m+1}$.

The Bernoulli random variable $X_i$ takes on value $1$ with probability $\frac{1}{m+1}$, and therefore $E(X_i)=\frac{1}{m+1}$.

By the linearity of expectation we then have $E(X)=E(X_1)+\cdots+E(X_n)=\frac{n}{m+1}$.

Remark: In the second part of the post, you were going after the distribution of the random variable $X$. So let us do that. There are $\binom{n+m}{m}$ equally likely ways to choose the positions of the $m$ black balls. Now we count the number of ways to choose positions of the black balls if the first $k$ positions are to be black and the next one red. That leaves $n+m-k-1$ positions, and $m-k$ black. Positions for them can be chosen in $\binom{n+m-k-1}{m-k}$ ways, or equivalently $\binom{n+m-k-1}{n-1}$ ways. It follows that $$\Pr(X=k)=\frac{\binom{n+m-k-1}{n-1}}{\binom{m+n}{m}}\tag{1}$$ for $k=0,1,\dots, m$.

Now we could use (1) to write down an expression for $E(X)$. With some fooling around with binomial coefficients, we could after a while simplify this to $\frac{n}{m+1}$. However, that is a painful way to find $E(X)$. The method of Indicator Random Variables described in the first part of the OP is far easier.

Related Question