How is this a binomial distribution when the trials are not independent

binomial distributionprobabilityprobability theory

I was trying to find solutions for this question:

Twenty percent of all telephones of a certain type are submitted for service
while under warranty. Of these, 60% can be repaired, whereas the other 40% must
be replaced with new units. If a company purchases 10 of these telephones, 
what is the probability that exactly two will end up being 
replaced under warranty?

After searching around I found this: solution. It uses the binomial formula to calculate P(X = 2).

A book on probability by Walpole states that the different Bernoulli trials of a Bernoulli process must be independent. But in this scenario, we're choosing 10 telephones from a lot and seeing if its defective. In essence, our Bernoulli trial is to purchase a telephone from the lot and the Bernoulli process is 10 such repeated trials.

But once we perform the trial once, the total quantity and either the quantity for defective pieces or non-defective pieces are changed. In essence, non-independent trials.

How can we, then, apply the Binomial distribution here?
Let's assume
you & I have no knowledge of the Binomial formula. What would be a
more intuitive way of solving this problem through basic probability
notions?

Best Answer

From the comments with some additions:

Your question boils down to the relationship between the binomial and hypergeometric distributions. Essentially the binomial distribution is the limit of the hypergeometric distribution when the number of successes and the number of failures in the whole population are both much larger than the sample size. In this case the probability that a draw will be a success or failure is only very weakly dependent on the previous outcomes. In this example this approximation would be quite accurate if, say, there are 10000 telephones out there and we are just examining 10 of them. In this case for example the probability that all 10 of them get replaced is $\frac{800}{10000} \cdot \frac{799}{9999} \cdot \dots \cdot \frac{791}{9991} \approx 1.01 \cdot 10^{-11}$ which is pretty close to $0.08^{10} \approx 1.07 \cdot 10^{-11}$.

Basically it is about intuition of the problem at hand: presumably there are way more than ~100 phones out there from this company.

That said, the caveat that I pointed out that both the number of successes and the number of failures need to be much larger than the sample size can be important, if you're considering an event which has very low or very high probability. Indeed in the example above, the relative error is about 5%, which happened because 800 is not that much larger than 10.

Preliminary (TL;DR)

Background

In his 1991 publication, Norman C. Beaulieu answered your question w/ what he dubbed, the generalized multinomial distribution (GMD). My explanation will focus on the GMD's utility.

Notation

# categories $= c$.
# trials $= t$.
Random vector $= X = \left[\begin{array}{cccc}X_1&X_2&\cdots&X_c\end{array}\right]^T$.
Category responses after $t$ trials vector $= x = \left[\begin{array}{cccc}x_1&x_2&\cdots&x_c\end{array}\right]^T$.
- $\sum_{k = 1}^c x_k = t$.
Probability of category response during trial matrix $= p = \left[\begin{array}{cccc} p_{1,1} & p_{1,2} & \cdots & p_{1,c} \\ p_{2,1} & p_{2,2} & \cdots & p_{2,c} \\ \vdots & \vdots & \ddots & \vdots \\ p_{t,1} & p_{t,2} & \cdots & p_{t,c} \end{array}\right]$.
Pmf of $X = P\left[X = x\right]$.
$[c] = \left\{1, 2, \cdots, c\right\}$.
Multiset of $[c] = ([c], m) = \left\{1^{m(1)}, 2^{m(2)}, \cdots, c^{m(c)}\right\}$.
- $m(i) = x_i$.
Permutations of $([c], m) = \mathfrak{S}_{([c], m)}$.
- $card\left(\mathfrak{S}_{([c], m)}\right) = \left(m(1), m(2), \cdots, m(c)\right)!$.

Pmf of GMD

$$P\left[X = x\right] = \sum_{\mathfrak{s} \in \mathfrak{S}_{([c], m)}} \left\{\prod_{k = 1}^t \left\{p_{k,\mathfrak{s}_k}\right\}\right\}$$

So far, I've identified it as being the superclass of 7 distributions! Namely...

Bernoulli distribution.
Uniform distribution.
Categorical distribution.
Binomial distribution.
Multinomial distribution.
Poisson's binomial distribution.
Generalized multinomial distribution (if your definition of superclass allows self-inclusion).

Examples

Games

g1: A 2 sided die is simulated using a fair standard die by assigning faces w/ pips 1 through 3 & 4 through 6 to sides 1 & 2, respectively. The die is biased by etching micro holes into faces w/ pips 1 through 3 s.t. $p_1 = 12/30$ & $p_2 = 18/30$. The 2 sided die is tossed 1 time & the category responses are recorded.
g2: Same as g1, accept w/ ideal standard die, i.e., $p_1 = p_2 = \cdots = p_6 = 5/30$.
g3: Same as g1, accept w/ standard die, i.e., $p_1 = p_2 = p_3 = 4/30$ & $p_4 = p_5 = p_6 = 6/30$.
g4: Same as g1, accept die is tossed 7 times.
g5: Same as g3, accept die is tossed 7 times.
g6: Same as g4, accept the micro holes are filled w/ $0.07$ kg of a material, which evaporates @ $0.01$ kg/s upon being sprayed w/ an activator, s.t. $p_1 = p_2 = 15/30$ for the 1st toss. Immediately after being sprayed, category responses are recorded every second.
g7: Same as g6, accept w/ standard die, i.e., $p_1 = p_2 = \cdots = p_6 = 5/30$ for the 1st toss.

Questions

q1: Find pmf & evaluate when $x = \left[\begin{array}{cc}0&1\end{array}\right]^T$.
q2: Find pmf & evaluate when $x = \left[\begin{array}{cccccc}0&1&0&0&0&0\end{array}\right]^T$.
q3: q2.
q4: Find pmf & evaluate when $x = \left[\begin{array}{cc}2&5\end{array}\right]^T$.
q5: Find pmf & evaluate when $x = \left[\begin{array}{cccccc}0&2&1&1&0&3\end{array}\right]^T$.
q6: q4.
q7: q5.

Answers w/o knowledge of GMD

a1: $X$ ~ Bernoulli distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{1!(12/30)^0(18/30)^1}{0!1!}$
  $\Longrightarrow P\left[X = x\right] = 3/5$.
a2: $X$ ~ Uniform distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(5/30)^{0 + 1 + 0 + 0 + 0 + 0}}{0!1!0!0!0!0!}$
  $\Longrightarrow P\left[X = x\right] = 1/6$.
a3: $X$ ~ Categorical distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(4/30)^{0 + 1 + 0}(6/30)^{0 + 0 + 0}}{0!1!0!0!0!0!}$
  $\Longrightarrow P\left[X = x\right] = 2/15$.
a4: $X$ ~ Binomial distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{7!(12/30)^2(18/30)^5}{2!5!}$
  $\Longrightarrow P\left[X = x\right] = 20412/78125$.
a5: $X$ ~ Multinomial distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{7!(4/30)^{0 + 2 + 1}(6/30)^{1 + 0 + 3}}{0!2!1!1!0!3!}$
  $\Longrightarrow P\left[X = x\right] = 224/140625$.
a6: $X$ ~ Poisson's binomial distribution.
- $P\left[\left[\begin{array}{cc}X_1&X_2\end{array}\right]^T = \left[\begin{array}{cc}x_1&x_2\end{array}\right]^T\right] = P\left[X_1 = x_1, X_2 = x_2\right] = P\left[X_1 = x_1\right] = P\left[X_2 = x_2\right]$.
- $p_1$ & $p_2$ are vectors now: $p_1 = \left[\begin{array}{cccc}p_{1_1}&p_{1_2}&\cdots&p_{1_t}\end{array}\right]^T, p_2 = \left[\begin{array}{cccc}p_{2_1}&p_{2_2}&\cdots&p_{2_t}\end{array}\right]^T$.
- $P\left[X_2 = x_2\right] = \frac{1}{t + 1}\sum_{i = 0}^t \left\{\exp\left(\frac{-j2\pi i x_2}{t + 1}\right) \prod_{k = 1}^t \left\{p_{2_k}\left(\exp\left(\frac{j2\pi i}{t + 1}\right) - 1\right) + 1\right\}\right\}$
  $= \frac{1}{8}\sum_{i = 0}^7 \left\{\exp\left(\frac{-j5\pi i}{4}\right) \prod_{k = 1}^7 \left\{\left(\frac{0.5k + 14.5}{30}\right)\left(\exp\left(\frac{j\pi i}{4}\right) - 1\right) + 1\right\}\right\}$
  $\Longrightarrow P\left[X_2 = 5\right] = 308327/1440000$.
a7: $X$ ~ Generalized multinomial distribution.
- ???

Answers w/ Knowledge of GMD

a1: $X$ ~ Bernoulli distribution.
- $p = \left[\begin{array}{c}\frac{12}{30}&\frac{18}{30}\end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(2\right)\right\}$.
a2: $X$ ~ Uniform distribution.
- $p = \left[\begin{array}{c}\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2\right)\right\}$.
a3: $X$ ~ Categorical distribution.
- $p = \left[\begin{array}{c}\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30}\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2\right)\right\}$.
a4: $X$ ~ Binomial distribution.
- $p = \left[\begin{array}{cc} \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(1,1,2,2,2,2,2\right), \ldots, \left(2,2,2,2,2,1,1\right)\right\}$.
a5: $X$ ~ Multinomial distribution.
- $p = \left[\begin{array}{cccccc} \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2,2,3,4,6,6,6\right), \ldots, \left(6,6,6,4,3,2,2\right)\right\}$.
a6: $X$ ~ Poisson's binomial distribution.
- $p = \left[\begin{array}{cc} \frac{15}{30}&\frac{15}{30} \\ \frac{14.5}{30}&\frac{15.5}{30} \\ \frac{14}{30}&\frac{16}{30} \\ \frac{13.5}{30}&\frac{16.5}{30} \\ \frac{13}{30}&\frac{17}{30} \\ \frac{12.5}{30}&\frac{17.5}{30} \\ \frac{12}{30}&\frac{18}{30} \end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(1,1,2,2,2,2,2\right), \ldots, \left(2,2,2,2,2,1,1\right)\right\}$.
a7: $X$ ~ Generalized multinomial distribution.
- $p = \left[\begin{array}{cccccc} \frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30} \\ \frac{4.8\overline{3}}{30}&\frac{4.8\overline{3}}{30}&\frac{4.8\overline{3}}{30} &\frac{5.1\overline{6}}{30}&\frac{5.1\overline{6}}{30}&\frac{5.1\overline{6}}{30} \\ \frac{4.\overline{6}}{30}&\frac{4.\overline{6}}{30}&\frac{4.\overline{6}}{30} &\frac{5.\overline{3}}{30}&\frac{5.\overline{3}}{30}&\frac{5.\overline{3}}{30} \\ \frac{4.5}{30}&\frac{4.5}{30}&\frac{4.5}{30} &\frac{5.5}{30}&\frac{5.5}{30}&\frac{5.5}{30} \\ \frac{4.\overline{3}}{30}&\frac{4.\overline{3}}{30}&\frac{4.\overline{3}}{30} &\frac{5.\overline{6}}{30}&\frac{5.\overline{6}}{30}&\frac{5.\overline{6}}{30} \\ \frac{4.1\overline{6}}{30}&\frac{4.1\overline{6}}{30}&\frac{4.1\overline{6}}{30} &\frac{5.8\overline{3}}{30}&\frac{5.8\overline{3}}{30}&\frac{5.8\overline{3}}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2,2,3,4,6,6,6\right), \ldots, \left(6,6,6,4,3,2,2\right)\right\}$.
- $P\left[X = x\right] = 59251/36905625$.

Final Words

I know my answer was very long (& went far beyond what OP asked for) but this had been flying around inside my head for quite some time & this q seemed like the most suitable landing strip.

I performed the last 6 calculations using the function gmdPmf (which I defined in Mathematica)...

(* GENERALIZED MULTINOMIAL DISTRIBUTION (GMD) *)
(* Note: mXn = # rows X # columns. *)
gmdPmf[
    x_ (* Responses of category j, after t trials have taken place. *),
    p_ (* Matrix (tXm) holds p_{trial i, category j} = P["Response of trial i is category j"]. *)
] := Module[{t, c, ⦋c⦌, allRPs, desiredRPs, count = 0, sum = 0, product = 1},
    t = Total[x]; (* # trials. *)
    c = Length[x]; (* # categories. *)
    ⦋c⦌ = Range[c]; (* Categories. *)
    allRPs = Tuples[⦋c⦌,t]; (* Matrix (c^tXt) holds all the response patterns given that t trials have occurred. *)
    desiredRPs = {}; (* Matrix ((x_1,x_2,...,x_c) !Xt) holds the desired response patterns; subset of allRPs wrt n. *)

    For[i = 1, i <= Length[allRPs], i++,
        For[j = 1, j <= c, j++, If[Count[allRPs[[i]],⦋c⦌[[j]]] == x[[j]], count++];];
        If[count == c, AppendTo[desiredRPs, allRPs[[i]]]];
        count = 0;
    ];

    For[i = 1, i <= Length[desiredRPs], i++, 
        For[j = 1, j <= t, j++, product *= (p[[j]][[desiredRPs[[i]][[j]]]]);];
        sum += product;
        product = 1;
    ];

    sum
];

(* ANSWERS *)
Print["a1: P[X = x] = ", gmdPmf[{0, 1}, {{12/30, 18/30}}], "."];
Print["a2: P[X = x] = ", gmdPmf[{0,1, 0, 0, 0, 0}, {{5/30, 5/30, 5/30, 5/30, 5/30, 5/30}}], "."];
Print["a3: P[X = x] = ", gmdPmf[{0,1, 0, 0, 0, 0}, {{4/30, 4/30, 4/30, 6/30, 6/30, 6/30}}], "."];
Print["a4: P[X = x] = ", gmdPmf[{2, 5}, ArrayFlatten[ConstantArray[{{12/30, 18/30}}, {7, 1}]]], "."];
Print["a5: P[X = x] = ", gmdPmf[{0, 2, 1, 1, 0, 3}, ArrayFlatten[ConstantArray[{{4/30, 4/30, 4/30, 6/30, 6/30, 6/30}}, {7, 1}]]], "."];
p = {}; For[i = 1, i <= 7, i++, l = ((31/2) - (1/2)*i)/30; r = ((29/2) + (1/2)*i)/30;  AppendTo[p,{l,r}];]; Print["a6: P[X = x] = ", gmdPmf[{2, 5}, p], "."];
p = {}; For[i = 1, i <= 7, i++, l = ((31/6) - (1/6)*i)/30; r = ((29/6) + (1/6)*i)/30;  AppendTo[p,{l,l,l,r,r,r}];]; Print["a7: P[X = x] = ", gmdPmf[{0, 2, 1, 1, 0, 3}, p], "."];

Clear[gmdPmf];

Please edit, if you know of any ways to make it shorter/faster. Congrats, if you made it to the end! (:

The cumulative binomial distribution, on the probability of “at least one”

Hint

$$P(X\geq k) = 1-P(X \lt k)$$

Solution

$$P(X\geq 1) = 1-P(X\lt 1) = 1- {1000\choose 0}\left(1\over 6\right)^0\left(1-{1\over 6}\right)^{999} = 1-({5\over 6})^{999}\approx 1$$ This "trick" is pretty obvious as it stands and it's pretty common to use it when dealing with calculating the cumulative probability of at least $m$ successes of big data sets when $m\ll n$