Statistics – Conjugate Prior of the Dirichlet Distribution

pr.probabilityst.statistics

What is the conjugate prior distribution of the Dirichlet distribution?

Best Answer

Neil sent me an email asking:

===

I read your post at http://www.stat.columbia.edu/~cook/movabletype/archives/2009/04/conjugate_prior.html and I was wondering if you could expand on how to update the Dirichlet conjugate prior that you provided in your paper:

S. Lefkimmiatis, P. Maragos, and G. Papandreou, Bayesian Inference on Multiscale Models for Poisson Intensity Estimation: Applications to Photon-Limited Image Denoising, IEEE Transactions on Image Processing, vol. 18, no. 8, pp. 1724-1741, Aug. 2009

In other words, given in your paper's notation the prior hyper-parameters (vector $\mathbf{v}$, and scalar $\eta$), and $N$ Dirichlet observations (vectors $\mathbf{\theta}_n, n=1,\dots,N$), how do you update $\mathbf{v}$ and $\eta$?

===

Here is my response:

Conjugate pairs are so convenient because there is a standard and simple way to incorporate new data by just modifying the parameters of the prior density. One just multiplies the likelihood with its conjugate prior; the result has the same parametric form as the prior, and the new parameters can be readily "read-off" by comparing the likelihood-prior product with the prior parametric form. This is described in detail in all standard texts in Bayesian statistics such as Gelman et al. (2003) or Bernardo and Smith (2000).

In the case of the Dirichlet and its conjugate prior described in our paper and using its notation, after observing $N$ Dirichlet vectors $\mathbf{\theta}_n$, $n=1,\dots,N$, where each vector $\mathbf{\theta}_n$ is $D$ dimensional with elements $\theta_n[t]$, $t=1,\dots,D$, the $D+1$ hyper-parameters should be updated as follows:

$\eta_N = \eta_0 + N$
$v_N[t] = v_0[t] - \sum_{n=1}^N \ln \theta_n[t], \quad t=1,\dots,D$, where $\eta_0$, $\mathbf{v}_0$ and $\eta_N$, $\mathbf{v}_N$ are the initial and updated model parameters, respectively.

You can verify this in a few lines of equations by following the previously described general rule.

Hope this helps!

Best Answer

Related Solutions

[Math] what can be said about the choice of a prior in Bayesian statistics

[Math] What can be said about an infinite linear chain of conjugate prior distributions

Related Question