Preliminary (TL;DR)
Background
In his 1991 publication, Norman C. Beaulieu answered your question w/ what he dubbed, the generalized multinomial distribution (GMD). My explanation will focus on the GMD's utility.
Notation
- # categories $= c$.
- # trials $= t$.
- Random vector $= X = \left[\begin{array}{cccc}X_1&X_2&\cdots&X_c\end{array}\right]^T$.
- Category responses after $t$ trials vector $= x = \left[\begin{array}{cccc}x_1&x_2&\cdots&x_c\end{array}\right]^T$.
- $\sum_{k = 1}^c x_k = t$.
- Probability of category response during trial matrix $= p = \left[\begin{array}{cccc} p_{1,1} & p_{1,2} & \cdots & p_{1,c} \\
p_{2,1} & p_{2,2} & \cdots & p_{2,c} \\
\vdots & \vdots & \ddots & \vdots \\
p_{t,1} & p_{t,2} & \cdots & p_{t,c}
\end{array}\right]$.
- Pmf of $X = P\left[X = x\right]$.
- $[c] = \left\{1, 2, \cdots, c\right\}$.
- Multiset of $[c] = ([c], m) = \left\{1^{m(1)}, 2^{m(2)}, \cdots, c^{m(c)}\right\}$.
- Permutations of $([c], m) = \mathfrak{S}_{([c], m)}$.
- $card\left(\mathfrak{S}_{([c], m)}\right) = \left(m(1), m(2), \cdots, m(c)\right)!$.
Pmf of GMD
$$P\left[X = x\right] = \sum_{\mathfrak{s} \in \mathfrak{S}_{([c], m)}} \left\{\prod_{k = 1}^t \left\{p_{k,\mathfrak{s}_k}\right\}\right\}$$
So far, I've identified it as being the superclass of 7 distributions! Namely...
- Bernoulli distribution.
- Uniform distribution.
- Categorical distribution.
- Binomial distribution.
- Multinomial distribution.
- Poisson's binomial distribution.
- Generalized multinomial distribution (if your definition of superclass allows self-inclusion).
Examples
Games
- g1: A 2 sided die is simulated using a fair standard die by assigning faces w/ pips 1 through 3 & 4 through 6 to sides 1 & 2, respectively. The die is biased by etching micro holes into faces w/ pips 1 through 3 s.t. $p_1 = 12/30$ & $p_2 = 18/30$. The 2 sided die is tossed 1 time & the category responses are recorded.
- g2: Same as g1, accept w/ ideal standard die, i.e., $p_1 = p_2 = \cdots = p_6 = 5/30$.
- g3: Same as g1, accept w/ standard die, i.e., $p_1 = p_2 = p_3 = 4/30$ & $p_4 = p_5 = p_6 = 6/30$.
- g4: Same as g1, accept die is tossed 7 times.
- g5: Same as g3, accept die is tossed 7 times.
- g6: Same as g4, accept the micro holes are filled w/ $0.07$ kg of a material, which evaporates @ $0.01$ kg/s upon being sprayed w/ an activator, s.t. $p_1 = p_2 = 15/30$ for the 1st toss. Immediately after being sprayed, category responses are recorded every second.
- g7: Same as g6, accept w/ standard die, i.e., $p_1 = p_2 = \cdots = p_6 = 5/30$ for the 1st toss.
Questions
- q1: Find pmf & evaluate when $x = \left[\begin{array}{cc}0&1\end{array}\right]^T$.
- q2: Find pmf & evaluate when $x = \left[\begin{array}{cccccc}0&1&0&0&0&0\end{array}\right]^T$.
- q3: q2.
- q4: Find pmf & evaluate when $x = \left[\begin{array}{cc}2&5\end{array}\right]^T$.
- q5: Find pmf & evaluate when $x = \left[\begin{array}{cccccc}0&2&1&1&0&3\end{array}\right]^T$.
- q6: q4.
- q7: q5.
Answers w/o knowledge of GMD
- a1: $X$ ~ Bernoulli distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{1!(12/30)^0(18/30)^1}{0!1!}$
$\Longrightarrow P\left[X = x\right] = 3/5$.
- a2: $X$ ~ Uniform distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(5/30)^{0 + 1 + 0 + 0 + 0 + 0}}{0!1!0!0!0!0!}$
$\Longrightarrow P\left[X = x\right] = 1/6$.
- a3: $X$ ~ Categorical distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(4/30)^{0 + 1 + 0}(6/30)^{0 + 0 + 0}}{0!1!0!0!0!0!}$
$\Longrightarrow P\left[X = x\right] = 2/15$.
- a4: $X$ ~ Binomial distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{7!(12/30)^2(18/30)^5}{2!5!}$
$\Longrightarrow P\left[X = x\right] = 20412/78125$.
- a5: $X$ ~ Multinomial distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^6 \frac{p_k^k}{k!} =
\frac{7!(4/30)^{0 + 2 + 1}(6/30)^{1 + 0 + 3}}{0!2!1!1!0!3!}$
$\Longrightarrow P\left[X = x\right] = 224/140625$.
- a6: $X$ ~ Poisson's binomial distribution.
- $P\left[\left[\begin{array}{cc}X_1&X_2\end{array}\right]^T = \left[\begin{array}{cc}x_1&x_2\end{array}\right]^T\right] = P\left[X_1 = x_1, X_2 = x_2\right] = P\left[X_1 = x_1\right] = P\left[X_2 = x_2\right]$.
- $p_1$ & $p_2$ are vectors now: $p_1 = \left[\begin{array}{cccc}p_{1_1}&p_{1_2}&\cdots&p_{1_t}\end{array}\right]^T, p_2 = \left[\begin{array}{cccc}p_{2_1}&p_{2_2}&\cdots&p_{2_t}\end{array}\right]^T$.
- $P\left[X_2 = x_2\right] = \frac{1}{t + 1}\sum_{i = 0}^t \left\{\exp\left(\frac{-j2\pi i x_2}{t + 1}\right) \prod_{k = 1}^t \left\{p_{2_k}\left(\exp\left(\frac{j2\pi i}{t + 1}\right) - 1\right) + 1\right\}\right\}$
$= \frac{1}{8}\sum_{i = 0}^7 \left\{\exp\left(\frac{-j5\pi i}{4}\right) \prod_{k = 1}^7 \left\{\left(\frac{0.5k + 14.5}{30}\right)\left(\exp\left(\frac{j\pi i}{4}\right) - 1\right) + 1\right\}\right\}$
$\Longrightarrow P\left[X_2 = 5\right] = 308327/1440000$.
- a7: $X$ ~ Generalized multinomial distribution.
Answers w/ Knowledge of GMD
- a1: $X$ ~ Bernoulli distribution.
- $p = \left[\begin{array}{c}\frac{12}{30}&\frac{18}{30}\end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(2\right)\right\}$.
- a2: $X$ ~ Uniform distribution.
- $p = \left[\begin{array}{c}\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2\right)\right\}$.
- a3: $X$ ~ Categorical distribution.
- $p = \left[\begin{array}{c}\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30}\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2\right)\right\}$.
- a4: $X$ ~ Binomial distribution.
- $p = \left[\begin{array}{cc}
\frac{12}{30}&\frac{18}{30} \\
\frac{12}{30}&\frac{18}{30} \\
\frac{12}{30}&\frac{18}{30} \\
\frac{12}{30}&\frac{18}{30} \\
\frac{12}{30}&\frac{18}{30} \\
\frac{12}{30}&\frac{18}{30} \\
\frac{12}{30}&\frac{18}{30}
\end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(1,1,2,2,2,2,2\right), \ldots, \left(2,2,2,2,2,1,1\right)\right\}$.
- a5: $X$ ~ Multinomial distribution.
- $p = \left[\begin{array}{cccccc}
\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\
\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\
\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\
\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\
\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\
\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\
\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30}
\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2,2,3,4,6,6,6\right), \ldots, \left(6,6,6,4,3,2,2\right)\right\}$.
- a6: $X$ ~ Poisson's binomial distribution.
- $p = \left[\begin{array}{cc}
\frac{15}{30}&\frac{15}{30} \\
\frac{14.5}{30}&\frac{15.5}{30} \\
\frac{14}{30}&\frac{16}{30} \\
\frac{13.5}{30}&\frac{16.5}{30} \\
\frac{13}{30}&\frac{17}{30} \\
\frac{12.5}{30}&\frac{17.5}{30} \\
\frac{12}{30}&\frac{18}{30}
\end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(1,1,2,2,2,2,2\right), \ldots, \left(2,2,2,2,2,1,1\right)\right\}$.
- a7: $X$ ~ Generalized multinomial distribution.
- $p = \left[\begin{array}{cccccc}
\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30} \\
\frac{4.8\overline{3}}{30}&\frac{4.8\overline{3}}{30}&\frac{4.8\overline{3}}{30}
&\frac{5.1\overline{6}}{30}&\frac{5.1\overline{6}}{30}&\frac{5.1\overline{6}}{30} \\
\frac{4.\overline{6}}{30}&\frac{4.\overline{6}}{30}&\frac{4.\overline{6}}{30}
&\frac{5.\overline{3}}{30}&\frac{5.\overline{3}}{30}&\frac{5.\overline{3}}{30} \\
\frac{4.5}{30}&\frac{4.5}{30}&\frac{4.5}{30}
&\frac{5.5}{30}&\frac{5.5}{30}&\frac{5.5}{30} \\
\frac{4.\overline{3}}{30}&\frac{4.\overline{3}}{30}&\frac{4.\overline{3}}{30}
&\frac{5.\overline{6}}{30}&\frac{5.\overline{6}}{30}&\frac{5.\overline{6}}{30} \\
\frac{4.1\overline{6}}{30}&\frac{4.1\overline{6}}{30}&\frac{4.1\overline{6}}{30}
&\frac{5.8\overline{3}}{30}&\frac{5.8\overline{3}}{30}&\frac{5.8\overline{3}}{30} \\
\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30}
\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2,2,3,4,6,6,6\right), \ldots, \left(6,6,6,4,3,2,2\right)\right\}$.
- $P\left[X = x\right] = 59251/36905625$.
Final Words
I know my answer was very long (& went far beyond what OP asked for) but this had been flying around inside my head for quite some time & this q seemed like the most suitable landing strip.
I performed the last 6 calculations using the function gmdPmf (which I defined in Mathematica)...
(* GENERALIZED MULTINOMIAL DISTRIBUTION (GMD) *)
(* Note: mXn = # rows X # columns. *)
gmdPmf[
x_ (* Responses of category j, after t trials have taken place. *),
p_ (* Matrix (tXm) holds p_{trial i, category j} = P["Response of trial i is category j"]. *)
] := Module[{t, c, ⦋c⦌, allRPs, desiredRPs, count = 0, sum = 0, product = 1},
t = Total[x]; (* # trials. *)
c = Length[x]; (* # categories. *)
⦋c⦌ = Range[c]; (* Categories. *)
allRPs = Tuples[⦋c⦌,t]; (* Matrix (c^tXt) holds all the response patterns given that t trials have occurred. *)
desiredRPs = {}; (* Matrix ((x_1,x_2,...,x_c) !Xt) holds the desired response patterns; subset of allRPs wrt n. *)
For[i = 1, i <= Length[allRPs], i++,
For[j = 1, j <= c, j++, If[Count[allRPs[[i]],⦋c⦌[[j]]] == x[[j]], count++];];
If[count == c, AppendTo[desiredRPs, allRPs[[i]]]];
count = 0;
];
For[i = 1, i <= Length[desiredRPs], i++,
For[j = 1, j <= t, j++, product *= (p[[j]][[desiredRPs[[i]][[j]]]]);];
sum += product;
product = 1;
];
sum
];
(* ANSWERS *)
Print["a1: P[X = x] = ", gmdPmf[{0, 1}, {{12/30, 18/30}}], "."];
Print["a2: P[X = x] = ", gmdPmf[{0,1, 0, 0, 0, 0}, {{5/30, 5/30, 5/30, 5/30, 5/30, 5/30}}], "."];
Print["a3: P[X = x] = ", gmdPmf[{0,1, 0, 0, 0, 0}, {{4/30, 4/30, 4/30, 6/30, 6/30, 6/30}}], "."];
Print["a4: P[X = x] = ", gmdPmf[{2, 5}, ArrayFlatten[ConstantArray[{{12/30, 18/30}}, {7, 1}]]], "."];
Print["a5: P[X = x] = ", gmdPmf[{0, 2, 1, 1, 0, 3}, ArrayFlatten[ConstantArray[{{4/30, 4/30, 4/30, 6/30, 6/30, 6/30}}, {7, 1}]]], "."];
p = {}; For[i = 1, i <= 7, i++, l = ((31/2) - (1/2)*i)/30; r = ((29/2) + (1/2)*i)/30; AppendTo[p,{l,r}];]; Print["a6: P[X = x] = ", gmdPmf[{2, 5}, p], "."];
p = {}; For[i = 1, i <= 7, i++, l = ((31/6) - (1/6)*i)/30; r = ((29/6) + (1/6)*i)/30; AppendTo[p,{l,l,l,r,r,r}];]; Print["a7: P[X = x] = ", gmdPmf[{0, 2, 1, 1, 0, 3}, p], "."];
Clear[gmdPmf];
Please edit, if you know of any ways to make it shorter/faster. Congrats, if you made it to the end! (:
Best Answer
From the comments with some additions:
Your question boils down to the relationship between the binomial and hypergeometric distributions. Essentially the binomial distribution is the limit of the hypergeometric distribution when the number of successes and the number of failures in the whole population are both much larger than the sample size. In this case the probability that a draw will be a success or failure is only very weakly dependent on the previous outcomes. In this example this approximation would be quite accurate if, say, there are 10000 telephones out there and we are just examining 10 of them. In this case for example the probability that all 10 of them get replaced is $\frac{800}{10000} \cdot \frac{799}{9999} \cdot \dots \cdot \frac{791}{9991} \approx 1.01 \cdot 10^{-11}$ which is pretty close to $0.08^{10} \approx 1.07 \cdot 10^{-11}$.
Basically it is about intuition of the problem at hand: presumably there are way more than ~100 phones out there from this company.
That said, the caveat that I pointed out that both the number of successes and the number of failures need to be much larger than the sample size can be important, if you're considering an event which has very low or very high probability. Indeed in the example above, the relative error is about 5%, which happened because 800 is not that much larger than 10.