The quickest approach is to count ways to get $6$ colors on $8$ balls. There are essentially two cases, $\langle 3,1,1,1,1,1\rangle$ and $\langle 2,2,1,1,1,1\rangle$. There are $\binom{6}{1}\binom{5}{3}\binom{5}{1}^5=187,500$ ways to get the first case. The $\binom{6}{1}$ counts the number of ways of choosing one color to get three balls, and the rest is the number of ways of choosing three balls from that one color and one for each of the other colors.
The second case has $\binom6 2\binom 5 2^2\binom 5 1^4=937,500$ different ways. There are $\binom 6 2$ ways to choose two colors to receive two balls, and the rest is the number of ways of choosing two balls from each of those colors and one ball from the others.
So the total cases are $1,125,000$, out of $\binom{30}{8}=5,852,925$. That gives a probability of about $0.1922$.
We use the notation from the following MSE
link with $m$ for
the number of trials and $n$ for the number of different types of
coupons. We treat the special case where there are $j$ instances of
each type and we are sampling without replacement.
We ask about the probability of obtaining the distribution
$$\prod_{q=1}^n C_q^{\alpha_q}$$
where $\alpha_q$ says we have that many instances of type $C_q.$
We obtain
$$\frac{(nj-\sum_{q=1}^n \alpha_q)!}{(nj)!}
\prod_{q=1}^n \frac{j!}{(j-\alpha_q)!}.$$
Therefore the sequences according to probability of length $m$ of $n$
types of coupons without replacement and a maximum of $j$ coupons of
each type are given by
$$m! [z^m]
\left(\sum_{k=0}^j \frac{j!}{(j-k)!}\frac{z^k}{k!}\right)^{n}
= m! [z^m] (1+z)^{nj} = {nj\choose m} \times m!.$$
We then obtain from first principles the formula
$$P[T=m] = \frac{1}{m!} {nj\choose m}^{-1} \times
n \times j \times (m-1)! [z^{m-1}]
\left(\sum_{k=1}^j \frac{j!}{(j-k)!}\frac{z^k}{k!}\right)^{n-1}
\\ = nj \times \frac{1}{m} {nj\choose m}^{-1}
[z^{m-1}] \left(-1 + (1+z)^j\right)^{n-1}.$$
This becomes
$$nj \times \frac{1}{m} {nj\choose m}^{-1}
[z^{m-1}] \sum_{q=0}^{n-1} {n-1\choose q} (-1)^{n-1-q} (1+z)^{qj}
\\ = {nj-1\choose m-1}^{-1}
\sum_{q=0}^{n-1} {n-1\choose q} (-1)^{n-1-q} {qj\choose m-1}.$$
Observe that
$${qj\choose m-1} {nj-1\choose m-1}^{-1}
= \frac{(qj)! \times (nj-1-(m-1))!}{(qj-(m-1))! \times (nj-1)!}
\\ = {nj-1\choose qj}^{-1} {nj-1-(m-1)\choose qj-(m-1)}.$$
We record for the probabilities the formula
$$\bbox[5px,border:2px solid #00A000]{ P[T=m] =
\sum_{q=0}^{n-1} {n-1\choose q} (-1)^{n-1-q}
{nj-1\choose qj}^{-1} {nj-1-(m-1)\choose nj-1-qj}.}$$
Start by verifying that this is a probability distribution. We obtain
for the sum in $m$
$$\sum_{m=n}^{(n-1)j+1} {nj-1-(m-1)\choose nj-1-qj}
\\ = [z^{nj-1-qj}] \sum_{m=n}^{(n-1)j+1} (1+z)^{nj-1-(m-1)}
\\ = [z^{nj-1-qj}] \sum_{m=j-1}^{n(j-1)} (1+z)^m
= [z^{nj-qj}] ((1+z)^{n(j-1)+1} - (1+z)^{j-1}).$$
We have $nj-qj\ge j$ so only the first term contributes and we obtain
$$\sum_m P[T=m] =
\sum_{q=0}^{n-1} {n-1\choose q} (-1)^{n-1-q} {nj-1\choose qj}^{-1}
{n(j-1)+1\choose nj-qj}
\\ = \frac{n(j-1)+1}{j}
\sum_{q=0}^{n-1} {n-1\choose q} \frac{(-1)^{n-1-q}}{n-q}
{nj-1\choose nj-qj-1}^{-1} {n(j-1)\choose nj-qj-1}$$
We get for the rightmost pair of binomial coefficients
$$\frac{(n(j-1))! \times (qj)!}{(nj-1)! \times (qj+1-n)!}
= {nj-1\choose n-1}^{-1} {qj\choose n-1}$$
which yields for the sum
$$\frac{n(j-1)+1}{j} {nj-1\choose n-1}^{-1}
\sum_{q=0}^{n-1} {n-1\choose q} \frac{(-1)^{n-1-q}}{n-q} {qj\choose n-1}
\\ = \frac{n(j-1)+1}{nj} {nj-1\choose n-1}^{-1}
\sum_{q=0}^{n-1} {n\choose q} (-1)^{n-1-q} {qj\choose n-1}
\\ = \frac{n(j-1)+1}{nj} {nj-1\choose n-1}^{-1} {nj\choose n-1}
\\ + \frac{n(j-1)+1}{nj} {nj-1\choose n-1}^{-1}
\sum_{q=0}^{n} {n\choose q} (-1)^{n-1-q} {qj\choose n-1}
\\ = \frac{n(j-1)+1}{nj}\frac{nj}{nj+1-n}
\\ + \frac{n(j-1)+1}{nj} {nj-1\choose n-1}^{-1}
[z^{n-1}] \sum_{q=0}^{n} {n\choose q} (-1)^{n-1-q} (1+z)^{qj}
\\ = 1 - \frac{n(j-1)+1}{nj} {nj-1\choose n-1}^{-1}
[z^{n-1}] (1-(1+z)^j)^n$$
Now observe that $[z^{n-1}] (1-(1+z)^j)^n = 0$ hence everything
simplifies to $$1$$ and we have a probability distribution.
Continuing with the expectation we have the following closed form:
$$\bbox[5px,border:2px solid #00A000]{ \mathrm{E}[T] =
\sum_{q=0}^{n-1} {n-1\choose q} (-1)^{n-1-q}
{nj-1\choose qj}^{-1}
\sum_{m=n}^{(n-1)j+1} m {nj-1-(m-1)\choose nj-1-qj}.}$$
By means of plotting strategy let us examine some of these. Here are
the first few for eight types of coupons starting at $j=1:$
$$8,{\frac {76627}{6435}},{\frac {76801}{5434}},{\frac {7473667}{480675}},
{\frac {1318429}{79794}},\ldots$$
and here is the initial segment for ten types of coupons:
$$10,{\frac {707825}{46189}},{\frac {7008811}{380380}},
{\frac {266299459}{13042315}},{\frac {182251913}{8360638}},
{\frac {748880445829}{32831263465}},\ldots$$
Careful inspection of these values reveals that we cannot hope for
additional simplification when $j\ge 2$ because if it were possible it
would have appeared in these sample values. We do see however that the
case $j=1$ should be possible, the value being $n$ (we always finish
after $n$ draws if there is only one instance of each coupon).
We now do this calculation, which is trivial, but nontheless a useful
sanity check, starting with
$$\sum_{q=0}^{n-1} {n-1\choose q} (-1)^{n-1-q}
{n-1\choose q}^{-1}
\sum_{m=n}^{n} m {n-1-(m-1)\choose n-1-q}
\\ = \sum_{q=0}^{n-1} (-1)^{n-1-q}
\times n {n-1-(n-1)\choose n-1-q}
\\ = (-1)^{n-1-(n-1)} \times n \times {0\choose n-1-(n-1)} = n.$$
It certainly seems like a worthwhile challenge to prove that the
closed form for $\mathrm{E}[T]$ is $n H_n$ in the limit, which is
confirmed by the numerical evidence.
We did verify the formula for the expectation in software, as follows.
It really is quite remarkable that the output from this program is in
excellent agreement with the closed form on all values that were
tested.
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <time.h>
#include <string.h>
int main(int argc, char **argv)
{
int n = 6 , j = 3, trials = 1000;
if(argc >= 2){
n = atoi(argv[1]);
}
if(argc >= 3){
j = atoi(argv[2]);
}
if(argc >= 4){
trials = atoi(argv[3]);
}
assert(1 <= n);
assert(1 <= j);
assert(1 <= trials);
srand48(time(NULL));
long long data = 0;
for(int tind = 0; tind < trials; tind++){
int src[n*j];
for(int cind = 0; cind < n*j; cind++)
src[cind] = cind/j;
int seen = 0; int steps = 0;
int dist[n];
for(int cind = 0; cind < n; cind++)
dist[cind] = 0;
while(seen < n){
int cpidx = drand48() * (double)(n*j-steps);
int coupon = src[cpidx];
for(int cind=cpidx; cind < n*j-steps-1; cind++)
src[cind] = src[cind+1];
steps++;
if(dist[coupon] == 0)
seen++;
dist[coupon]++;
}
data += steps;
}
long double expt = (long double)data/(long double)trials;
printf("[n = %d, j = %d, trials = %d]: %Le\n",
n, j, trials, expt);
exit(0);
}
Best Answer
The basic idea is to represent the process of drawing "colored balls with numbers" as an absorbing Markov chain. Then the fundamental matrix gives us the expected number of steps to reach an absorbing state:
$$ N = \sum_{k=0}^\infty Q^k = (I - Q)^{-1} $$
when the probability transition matrix $P$ is written in block form with all absorbing states grouped at the end:
$$ P = \begin{bmatrix} Q & R \\ 0 & I \end{bmatrix} $$
That is, the expected number of steps needed to reach an absorbing state starting from a specified transient (non-absorbing) state will be the entry corresponding to that state in the row vector $N \mathbf{1}$ where $\mathbf{1}$ is the vector of all ones.
States that are not absorbing are called transient states, and one can think of the similarity of the general formula above to the limit of a geometric series as result of moving among the transient states until an absorbing state is reached.
In any case using the case $m = 2$ colors and $n = 2$ numbers described in the Question will illustrate the computation. Any first step always give us one color and one number, which state we label $(1,1)$. The absorbing state, both colors and both numbers, we label $(2,2)$. The transition matrix thus involves three transient states and one absorbing state:
$$ P = \begin{bmatrix} 1/4 & 1/4 & 1/4 & 1/4 \\ 0 & 1/2 & 0 & 1/2 \\ 0 & 0 & 1/2 & 1/2 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$
$$ I - Q = \begin{bmatrix} 3/4 & -1/4 & -1/4 \\ 0 & 1/2 & 0 \\ 0 & 0 & 1/2 \end{bmatrix} $$
$$ (I - Q)^{-1} \mathbf 1 = [ 8/3 \;\; 2 \;\; 2 ]^T $$
After one step, giving one color and one number, the expected number of steps until hitting the absorbing state is $8/3$. In total we expect $1 + 8/3 = 11/3$ steps to reach the absorbing state with two colors and two numbers.
The uniform probabilities among colors as well as among numbers allows us to represent the transient states in a concise fashion, $(i,j)$ where $i$ is the count of distinct colors sampled so far and $j$ is the count of distinct numbers sampled. Immediately after the first sample is drawn we have $i,j = 1$, so only the states with $1 \le i \le m, 1 \le j \le n$ need to be represented. Of those $mn$ states, the state $(m,n)$ is absorbing and the other $mn-1$ states are transient. So $Q$ will be size $(mn-1)\times(mn-1)$, and the fact that sampled counts can never decrease means that $Q$ is upper triangular if we order the states naturally (lexicographically).
Evaluating $(I-Q)^{-1} \mathbf 1$ can therefore be done by backsolving the system $(I-Q) \mathbf v = \mathbf 1$. This is very efficient in that the matrix is not explicitly inverted and has complexity $O(m^2n^2)$ arithmetic operations.
Once that is done, adding one to the first entry of $\mathbf v$ (accounts for taking the first step) results in the expected number of steps to finish the "two dimensional coupon collector" task.