[Math] poisson limit theorem for multinomial distribution

approximationprobability distributionsprobability theoryprobability-limit-theorems

It is well known that the poisson Distribution may be used as an Approximation to the binomial distribution, under certain conditions. But now let $X = (X_{1},…,X_{k})$ be multinomial distributed with index $n$ and Parameter $\pi = (\pi_{1},…,\pi_{n})$ with $\pi_{1}+…+\pi_{n} = 1$, i.e. each $X_{i}$ are Independent and binomial distributed with Parameters $n$ and $\pi_{i}$ and $X_{1} + … + X_{k} = n$. Is it also true that
\begin{equation}
\mathbb{P}(X_{1}=n_{1},…,X_{k}=n_{k}) \xrightarrow{ n \to \infty } \prod_{j=1}^{k}e^{-\sigma_{j}}\frac{\sigma_{j}^{n_{j}}}{n_{j}!}
\end{equation}
where $n\cdot \pi_{j} \rightarrow \sigma_{j}$ for $n \to \infty$.

Now we know that each $X_{i}$ is asymptotically poisson distributed to the paramater $\sigma_{i}$. Since they are Independent can I conclude that the Limit random variables are also Independent? If so then I can conclude that

\begin{equation}
\mathbb{P}(X_{1}=n_{1},…,X_{k}=n_{k}) \xrightarrow{ n \to \infty } \mathbb{P}(Y_{1}=n_{1},…,Y_{k}=n_{k}) = \prod_{j=1}^{k}e^{-\sigma_{j}}\frac{\sigma_{j}^{n_{j}}}{n_{j}!},
\end{equation}
where each $Y_{i}$ is poisson distributed to the Parameter $\sigma_{i}$.

Best Answer

An example of something along these lines that would make sense is the following. Fix a function $f : \mathbb{N} \to \mathbb{R}_{++}$ and a positive integer $q$. For each $n=kq$ for $k \in \mathbb{N}$, consider the multinomial distribution on $n$ objects distributed into $qn$ bins, where the probability associated to the $i$th bin is proportional to $f(i)$ i.e. they are $\frac{f(i)}{\sum_{j=1}^{qn} f(j)}$. Then as $k \to \infty$ you might hope to see a "multivariate Poisson distribution" under suitable assumptions on $f$ (e.g. its range is contained in some $[a,b]$ with $0<a<b<\infty$). Moreover you might hope to see asymptotic independence in the sense that you describe, because any one bin asymptotically cannot affect the other bins very much.

But for a fixed number of bins this is a total non-starter: the dependence between the bins will remain significant no matter how big $n$ is. Moreover the occupation number of at least one bin must be diverging in probability; it cannot be that all of the occupation probabilities are simultaneously $O(1/n)$ unless the number of bins goes to infinity.

What I described in the first paragraph is a kind of "thermodynamic limit" in which you simultaneously send the number of objects and the "space" that they are allowed to occupy to become large, at asymptotically the same "rate". You might look into statistical mechanics if you are interested in this type of problem.

A different way to make sense of this is to embrace the idea that at least one occupation number is diverging in probability. One way to do that is to say that a Bin($n,\lambda/n$) random variable describes the sum of the occupation numbers of the other bins. Then the components contributing to the low occupation number bins can become asymptotically independent Poissons with parameters summing up to $\lambda$.

Preliminary (TL;DR)

Background

In his 1991 publication, Norman C. Beaulieu answered your question w/ what he dubbed, the generalized multinomial distribution (GMD). My explanation will focus on the GMD's utility.

Notation

# categories $= c$.
# trials $= t$.
Random vector $= X = \left[\begin{array}{cccc}X_1&X_2&\cdots&X_c\end{array}\right]^T$.
Category responses after $t$ trials vector $= x = \left[\begin{array}{cccc}x_1&x_2&\cdots&x_c\end{array}\right]^T$.
- $\sum_{k = 1}^c x_k = t$.
Probability of category response during trial matrix $= p = \left[\begin{array}{cccc} p_{1,1} & p_{1,2} & \cdots & p_{1,c} \\ p_{2,1} & p_{2,2} & \cdots & p_{2,c} \\ \vdots & \vdots & \ddots & \vdots \\ p_{t,1} & p_{t,2} & \cdots & p_{t,c} \end{array}\right]$.
Pmf of $X = P\left[X = x\right]$.
$[c] = \left\{1, 2, \cdots, c\right\}$.
Multiset of $[c] = ([c], m) = \left\{1^{m(1)}, 2^{m(2)}, \cdots, c^{m(c)}\right\}$.
- $m(i) = x_i$.
Permutations of $([c], m) = \mathfrak{S}_{([c], m)}$.
- $card\left(\mathfrak{S}_{([c], m)}\right) = \left(m(1), m(2), \cdots, m(c)\right)!$.

Pmf of GMD

$$P\left[X = x\right] = \sum_{\mathfrak{s} \in \mathfrak{S}_{([c], m)}} \left\{\prod_{k = 1}^t \left\{p_{k,\mathfrak{s}_k}\right\}\right\}$$

So far, I've identified it as being the superclass of 7 distributions! Namely...

Bernoulli distribution.
Uniform distribution.
Categorical distribution.
Binomial distribution.
Multinomial distribution.
Poisson's binomial distribution.
Generalized multinomial distribution (if your definition of superclass allows self-inclusion).

Examples

Games

g1: A 2 sided die is simulated using a fair standard die by assigning faces w/ pips 1 through 3 & 4 through 6 to sides 1 & 2, respectively. The die is biased by etching micro holes into faces w/ pips 1 through 3 s.t. $p_1 = 12/30$ & $p_2 = 18/30$. The 2 sided die is tossed 1 time & the category responses are recorded.
g2: Same as g1, accept w/ ideal standard die, i.e., $p_1 = p_2 = \cdots = p_6 = 5/30$.
g3: Same as g1, accept w/ standard die, i.e., $p_1 = p_2 = p_3 = 4/30$ & $p_4 = p_5 = p_6 = 6/30$.
g4: Same as g1, accept die is tossed 7 times.
g5: Same as g3, accept die is tossed 7 times.
g6: Same as g4, accept the micro holes are filled w/ $0.07$ kg of a material, which evaporates @ $0.01$ kg/s upon being sprayed w/ an activator, s.t. $p_1 = p_2 = 15/30$ for the 1st toss. Immediately after being sprayed, category responses are recorded every second.
g7: Same as g6, accept w/ standard die, i.e., $p_1 = p_2 = \cdots = p_6 = 5/30$ for the 1st toss.

Questions

q1: Find pmf & evaluate when $x = \left[\begin{array}{cc}0&1\end{array}\right]^T$.
q2: Find pmf & evaluate when $x = \left[\begin{array}{cccccc}0&1&0&0&0&0\end{array}\right]^T$.
q3: q2.
q4: Find pmf & evaluate when $x = \left[\begin{array}{cc}2&5\end{array}\right]^T$.
q5: Find pmf & evaluate when $x = \left[\begin{array}{cccccc}0&2&1&1&0&3\end{array}\right]^T$.
q6: q4.
q7: q5.

Answers w/o knowledge of GMD

a1: $X$ ~ Bernoulli distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{1!(12/30)^0(18/30)^1}{0!1!}$
  $\Longrightarrow P\left[X = x\right] = 3/5$.
a2: $X$ ~ Uniform distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(5/30)^{0 + 1 + 0 + 0 + 0 + 0}}{0!1!0!0!0!0!}$
  $\Longrightarrow P\left[X = x\right] = 1/6$.
a3: $X$ ~ Categorical distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(4/30)^{0 + 1 + 0}(6/30)^{0 + 0 + 0}}{0!1!0!0!0!0!}$
  $\Longrightarrow P\left[X = x\right] = 2/15$.
a4: $X$ ~ Binomial distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{7!(12/30)^2(18/30)^5}{2!5!}$
  $\Longrightarrow P\left[X = x\right] = 20412/78125$.
a5: $X$ ~ Multinomial distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{7!(4/30)^{0 + 2 + 1}(6/30)^{1 + 0 + 3}}{0!2!1!1!0!3!}$
  $\Longrightarrow P\left[X = x\right] = 224/140625$.
a6: $X$ ~ Poisson's binomial distribution.
- $P\left[\left[\begin{array}{cc}X_1&X_2\end{array}\right]^T = \left[\begin{array}{cc}x_1&x_2\end{array}\right]^T\right] = P\left[X_1 = x_1, X_2 = x_2\right] = P\left[X_1 = x_1\right] = P\left[X_2 = x_2\right]$.
- $p_1$ & $p_2$ are vectors now: $p_1 = \left[\begin{array}{cccc}p_{1_1}&p_{1_2}&\cdots&p_{1_t}\end{array}\right]^T, p_2 = \left[\begin{array}{cccc}p_{2_1}&p_{2_2}&\cdots&p_{2_t}\end{array}\right]^T$.
- $P\left[X_2 = x_2\right] = \frac{1}{t + 1}\sum_{i = 0}^t \left\{\exp\left(\frac{-j2\pi i x_2}{t + 1}\right) \prod_{k = 1}^t \left\{p_{2_k}\left(\exp\left(\frac{j2\pi i}{t + 1}\right) - 1\right) + 1\right\}\right\}$
  $= \frac{1}{8}\sum_{i = 0}^7 \left\{\exp\left(\frac{-j5\pi i}{4}\right) \prod_{k = 1}^7 \left\{\left(\frac{0.5k + 14.5}{30}\right)\left(\exp\left(\frac{j\pi i}{4}\right) - 1\right) + 1\right\}\right\}$
  $\Longrightarrow P\left[X_2 = 5\right] = 308327/1440000$.
a7: $X$ ~ Generalized multinomial distribution.
- ???

Answers w/ Knowledge of GMD

a1: $X$ ~ Bernoulli distribution.
- $p = \left[\begin{array}{c}\frac{12}{30}&\frac{18}{30}\end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(2\right)\right\}$.
a2: $X$ ~ Uniform distribution.
- $p = \left[\begin{array}{c}\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2\right)\right\}$.
a3: $X$ ~ Categorical distribution.
- $p = \left[\begin{array}{c}\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30}\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2\right)\right\}$.
a4: $X$ ~ Binomial distribution.
- $p = \left[\begin{array}{cc} \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(1,1,2,2,2,2,2\right), \ldots, \left(2,2,2,2,2,1,1\right)\right\}$.
a5: $X$ ~ Multinomial distribution.
- $p = \left[\begin{array}{cccccc} \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2,2,3,4,6,6,6\right), \ldots, \left(6,6,6,4,3,2,2\right)\right\}$.
a6: $X$ ~ Poisson's binomial distribution.
- $p = \left[\begin{array}{cc} \frac{15}{30}&\frac{15}{30} \\ \frac{14.5}{30}&\frac{15.5}{30} \\ \frac{14}{30}&\frac{16}{30} \\ \frac{13.5}{30}&\frac{16.5}{30} \\ \frac{13}{30}&\frac{17}{30} \\ \frac{12.5}{30}&\frac{17.5}{30} \\ \frac{12}{30}&\frac{18}{30} \end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(1,1,2,2,2,2,2\right), \ldots, \left(2,2,2,2,2,1,1\right)\right\}$.
a7: $X$ ~ Generalized multinomial distribution.
- $p = \left[\begin{array}{cccccc} \frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30} \\ \frac{4.8\overline{3}}{30}&\frac{4.8\overline{3}}{30}&\frac{4.8\overline{3}}{30} &\frac{5.1\overline{6}}{30}&\frac{5.1\overline{6}}{30}&\frac{5.1\overline{6}}{30} \\ \frac{4.\overline{6}}{30}&\frac{4.\overline{6}}{30}&\frac{4.\overline{6}}{30} &\frac{5.\overline{3}}{30}&\frac{5.\overline{3}}{30}&\frac{5.\overline{3}}{30} \\ \frac{4.5}{30}&\frac{4.5}{30}&\frac{4.5}{30} &\frac{5.5}{30}&\frac{5.5}{30}&\frac{5.5}{30} \\ \frac{4.\overline{3}}{30}&\frac{4.\overline{3}}{30}&\frac{4.\overline{3}}{30} &\frac{5.\overline{6}}{30}&\frac{5.\overline{6}}{30}&\frac{5.\overline{6}}{30} \\ \frac{4.1\overline{6}}{30}&\frac{4.1\overline{6}}{30}&\frac{4.1\overline{6}}{30} &\frac{5.8\overline{3}}{30}&\frac{5.8\overline{3}}{30}&\frac{5.8\overline{3}}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2,2,3,4,6,6,6\right), \ldots, \left(6,6,6,4,3,2,2\right)\right\}$.
- $P\left[X = x\right] = 59251/36905625$.

Final Words

I know my answer was very long (& went far beyond what OP asked for) but this had been flying around inside my head for quite some time & this q seemed like the most suitable landing strip.

I performed the last 6 calculations using the function gmdPmf (which I defined in Mathematica)...

(* GENERALIZED MULTINOMIAL DISTRIBUTION (GMD) *)
(* Note: mXn = # rows X # columns. *)
gmdPmf[
    x_ (* Responses of category j, after t trials have taken place. *),
    p_ (* Matrix (tXm) holds p_{trial i, category j} = P["Response of trial i is category j"]. *)
] := Module[{t, c, ⦋c⦌, allRPs, desiredRPs, count = 0, sum = 0, product = 1},
    t = Total[x]; (* # trials. *)
    c = Length[x]; (* # categories. *)
    ⦋c⦌ = Range[c]; (* Categories. *)
    allRPs = Tuples[⦋c⦌,t]; (* Matrix (c^tXt) holds all the response patterns given that t trials have occurred. *)
    desiredRPs = {}; (* Matrix ((x_1,x_2,...,x_c) !Xt) holds the desired response patterns; subset of allRPs wrt n. *)

    For[i = 1, i <= Length[allRPs], i++,
        For[j = 1, j <= c, j++, If[Count[allRPs[[i]],⦋c⦌[[j]]] == x[[j]], count++];];
        If[count == c, AppendTo[desiredRPs, allRPs[[i]]]];
        count = 0;
    ];

    For[i = 1, i <= Length[desiredRPs], i++, 
        For[j = 1, j <= t, j++, product *= (p[[j]][[desiredRPs[[i]][[j]]]]);];
        sum += product;
        product = 1;
    ];

    sum
];

(* ANSWERS *)
Print["a1: P[X = x] = ", gmdPmf[{0, 1}, {{12/30, 18/30}}], "."];
Print["a2: P[X = x] = ", gmdPmf[{0,1, 0, 0, 0, 0}, {{5/30, 5/30, 5/30, 5/30, 5/30, 5/30}}], "."];
Print["a3: P[X = x] = ", gmdPmf[{0,1, 0, 0, 0, 0}, {{4/30, 4/30, 4/30, 6/30, 6/30, 6/30}}], "."];
Print["a4: P[X = x] = ", gmdPmf[{2, 5}, ArrayFlatten[ConstantArray[{{12/30, 18/30}}, {7, 1}]]], "."];
Print["a5: P[X = x] = ", gmdPmf[{0, 2, 1, 1, 0, 3}, ArrayFlatten[ConstantArray[{{4/30, 4/30, 4/30, 6/30, 6/30, 6/30}}, {7, 1}]]], "."];
p = {}; For[i = 1, i <= 7, i++, l = ((31/2) - (1/2)*i)/30; r = ((29/2) + (1/2)*i)/30;  AppendTo[p,{l,r}];]; Print["a6: P[X = x] = ", gmdPmf[{2, 5}, p], "."];
p = {}; For[i = 1, i <= 7, i++, l = ((31/6) - (1/6)*i)/30; r = ((29/6) + (1/6)*i)/30;  AppendTo[p,{l,l,l,r,r,r}];]; Print["a7: P[X = x] = ", gmdPmf[{0, 2, 1, 1, 0, 3}, p], "."];

Clear[gmdPmf];

Please edit, if you know of any ways to make it shorter/faster. Congrats, if you made it to the end! (:

[Math] Multinomial distribution to Binomial distribution

If $X=(X_1,\ldots,X_r)$ has a multinomial distribution, then each of the components $X_1,\ldots,X_r$ has a binomial distribution.

You're distributing $n$ objects into $r$ bins. For each object, the probability that it falls into the $k$th bin is $p_k,$ for $k=1,\ldots,r.$ The number of objects that fall into the $k$th bin is $X_k,$ for $k=1,\ldots,r.$ So $X_1+X_2$ is the number of objects falling into either of the first two bins. The probability that an object falls into either of the first two bins is the sum of the probabilities of its falling into those two bins, i.e. it is $p_1+p_2.$ In effect, you've simply joined those two bins together, so now you have $r-1$ bins, with probabilities $p_1+p_1,p_3,p_4,\ldots,p_r$ of an object falling into them. Therefore the distribution of $(X_1+X_2,X_3,X_4,\ldots,X_r)$ is multinomial with parameters $(n,p_1+p_2, p_3, p_4, \ldots, p_r).$ And as before, each component separately has a binomial distribution.