[Math] Conditional probability of multinomial distribution

probabilityprobability distributionsstatistics

I find it difficult to construct the conditional probability $P(X_i = x_i\mid X_i + X_j = t)$. It's supposed to be ~BIN(t,$\frac{p_i}{pi+pj})$

I first use the definition of conditional probability.

$$P(X_i \mid X_i + X_j)= \frac{P(X_i=x_i \cap X_i + X_j=t)}{P(X_i + X_j=t)}$$

Now, for the numerator, I use the multinomial distribution, which gives

$$ \frac{n!}{x_i!(x_i+x_j)! (n-x_i-x_i-x_j)!} p_i^{x_i} (p_i+p_j)^{x_i+x_j} (1-p_i-p_i-p_j)^{n-x_i-x_i-x_j}$$

For the denominator, I write

$$ \frac{n!}{(x_i+x_j)! (n-x_i-x_j)!} (p_i+p_j)^{x_i+x_j} (1-p_i-p_j)^{n-x_i-x_j}$$

Combining them doesn't really cancel enugh things out. THis seems too complicated. Did I make a mistake somewhere?

Best Answer

Your mistake was to assume that $(X_i,X_i+X_j,\sum_{k\notin\{i,j\}} X_k)$ has a multinomial distribution. It is possible to combine components of a multinomial distribution to get another multinomial distribution, but for this to work we must use each component exactly once (i.e., we can't include $X_i$ as a term in both components).

$(X_1,X_2,\dots,X_n)$ are assumed to have a multinomial distribution with parameters $(p_1,p_2,\dots,p_n)$. It follows that for distinct $i,j$, $(X_i, X_j, \sum_{k\notin\{i,j\}}X_k)$ has a multinomial distribution with parameters $(p_i,p_j,\sum_{k\notin\{i,j\}} p_k)=(p_i,p_j,1-p_i-p_j)$. So we may calculate, \begin{align*} P(X_i=x_i \cap X_i+X_j=t) &= P(X_i=x_i \cap X_j=t-x_i \cap \sum_{k\notin\{i,j\}}X_k=n-t) \\ &= \frac{n!}{x_i!(t-x_i)!(n-t)!}p_i^{x_i}p_j^{t-x_i}\left(1-p_i-p_j\right)^{n-t} \end{align*} On the other hand, $X_i+X_j$ has a binomial distribution with parameters $n$ and $p_i+p_j$, so $$P(X_i+X_j=t) = \frac{n!}{t!(n-t)!}(p_i+p_j)^t(1-p_i-p_j)^{n-t}$$ Therefore, \begin{align*} P(X_i=x_i \mid X_i+X_j=t) &= \frac{P(X_i=x_i \cap X_i+X_j=t)}{P(X_i+X_j=t)} \\ &= \frac{\frac{n!}{x_i!(t-x_i)!(n-t)!}p_i^{x_i}p_j^{t-x_i}\left(1-p_i-p_j\right)^{n-t}}{\frac{n!}{t!(n-t)!}(p_i+p_j)^t(1-p_i-p_j)^{n-t}} \\ &= \frac{t!}{x_i!(t-x_i)!}\left(\frac{p_i}{p_i+p_j}\right)^{x_i}\left(\frac{p_j}{p_i+p_j}\right)^{t-x_i} \end{align*} So the conditional distribution of $X_i$, given $X_i+X_j=t$, is binomial with parameters $t$ and $\frac{p_i}{p_i+p_j}$, as claimed.