Solved – Maximum a posteriori on Multinomial distribution with a Dirichlet prior can result in negative probabilities

dirichlet distributionmaximum likelihoodmultinomial-distributionoptimizationposterior

I am doing a maximum a posteriori (MAP) estimation of a Multinomial distribution $M(c_1,\dots,c_n|p_1,\dots,p_n)$ with a Dirichlet prior $D(p_1,\dots,p_n|\alpha_1,\dots,\alpha_n)$. The experimental counts for the MAP estimate are $(c_1,\dots,c_n)$.

My understanding is that MAP is equivalent to $\text{argmax}(M(\vec{c}|\vec{p})D(\vec{p}|\vec{\alpha}))$ over $\vec{p}$ for fixed experimental data $\vec{c}$ and a fixed prior $\vec{\alpha}$. The solution seems to be

$p_i = \frac{c_i+\alpha_i-1}{\sum_{i=1}^{n}(c_i+\alpha_i-1)}$.

However this can be negative (because the naive solution using just a Lagrange multiplier does not impose the $p_i>0$ constraints). For instance, for a category $i$ with zero counts $c_i=0$ and a prior $\alpha_i=0.5$ we get $p_i<0$.

Is there a known analytic solution for MAP that ensures the multinomial probabilities are never negative? Do I need to do it numerically instead?

Or maybe I am completely misunderstanding how the MAP is to be performed? Any suggestions or appropriate literature would be welcome.

Best Answer

Just to reiterate you have $n$-outcome count data $\vec{c} = (c_1,...,c_n)$ and I will assume this is from a total of $N$ shots. The heirarchical model you have described then is the following

$$\vec{c}|\vec{p}\sim \text{Mulit}(N,\vec{p}),\\ \vec{p}|\vec{\alpha} \sim \text{Dir}(\vec{\alpha}). $$

Now, due to the Dirichlet distribution being the conjugate prior for a multinomial likelihood, the posterior is also a Dirichlet distribution. In particular$$\vec{p}|\vec{c},\vec{\alpha}\sim \text{Dir}(\vec{c}+\vec{\alpha}).$$ The mode of this posterior distribution is, as you correctly pointed out$$\vec{p}_{\text{MAP}}=\frac{\vec{c}+\vec{\alpha}-1}{\sum_{k=1}^{n}(c_k+\alpha_k-1) }\\=\frac{\vec{c}+\vec{\alpha}-1}{N-n+\sum_{k=1}^{n}\alpha_k} $$ but this only holds for $\alpha_i>1$. If your prior parameters $\vec{\alpha}$ do not satisfy this then you will need to resort to numerics, yes.

However, the MAP estimate is only one particular choice of point estimate and possibly not the best for this problem. Another is the posterior mean which, for this posterior, is given by $$ \vec{p}_{\text{CM}} = \frac{\vec{c}+\vec{\alpha}}{N+\sum_{k=1}^{n}\alpha_k}$$ and instead holds for all $\alpha_i>0$. This gives you a closed form point estimate for your unknown $\vec{p}$ for any choice of prior, although it is not the MAP.