Neil sent me an email asking:
===
I read your post at http://www.stat.columbia.edu/~cook/movabletype/archives/2009/04/conjugate_prior.html and I was wondering if you could expand on how to update the Dirichlet conjugate prior that you provided in your paper:
S. Lefkimmiatis, P. Maragos, and G. Papandreou,
Bayesian Inference on Multiscale Models for Poisson Intensity Estimation: Applications to Photon-Limited Image Denoising,
IEEE Transactions on Image Processing, vol. 18, no. 8, pp. 1724-1741, Aug. 2009
In other words, given in your paper's notation the prior hyper-parameters (vector $\mathbf{v}$, and scalar $\eta$), and $N$ Dirichlet observations (vectors $\mathbf{\theta}_n, n=1,\dots,N$), how do you update $\mathbf{v}$ and $\eta$?
===
Here is my response:
Conjugate pairs are so convenient because there is a standard and simple way to incorporate new data by just modifying the parameters of the prior density. One just multiplies the likelihood with its conjugate prior; the result has the same parametric form as the prior, and the new parameters can be readily "read-off" by comparing the likelihood-prior product with the prior parametric form. This is described in detail in all standard texts in Bayesian statistics such as Gelman et al. (2003) or Bernardo and Smith (2000).
In the case of the Dirichlet and its conjugate prior described in our paper and using its notation, after observing $N$ Dirichlet vectors $\mathbf{\theta}_n$, $n=1,\dots,N$, where each vector $\mathbf{\theta}_n$ is $D$ dimensional with elements $\theta_n[t]$, $t=1,\dots,D$, the $D+1$ hyper-parameters should be updated as follows:
- $\eta_N = \eta_0 + N$
- $v_N[t] = v_0[t] - \sum_{n=1}^N \ln \theta_n[t], \quad t=1,\dots,D$, where $\eta_0$, $\mathbf{v}_0$ and $\eta_N$, $\mathbf{v}_N$ are the initial and updated model parameters, respectively.
You can verify this in a few lines of equations by following the previously described general rule.
Hope this helps!
Many hold that Bayesian statistics "from a purely mathematical point of view" is entirely coextensive with probability (however it is that you want to define its boundaries as a mathematical discipline). Nonetheless, if I interpret your request as being for a mathematically sophisticated and rigorous exposition on why the Bayesian approach is a worthy one, three book spring to mind.
- Theory of Statistics by Mark Schervish
- Bayes Theory by John Hartigan
- The Bayesian Choice by Christian Robert
The first of these is a general graduate text in statistics, but the author gives uncommonly complete coverage of both Bayesian and frequentist methods.
The second is a smaller volume and, as I recall, is devoted to some of the more delicate issues surround finite versus countable additivity as relates to using probability distributions as priors in a Bayesian approach.
The final book is more general, but the style is more formal than the Bernardo and Smith book mentioned by PaPiro. (This is, in my experience, true of the style of French Bayesians :)
As I said, the distinctive elements of the Bayesian perspective are more philosophical than technical, but there are some technical areas that have received attention in the Bayesian community that may be of independent mathematical interest. One would be the role of so-called "improper" priors as mentioned above.
Another is the role of conditional distributions as a primitive rather than derived notion, leading to the idea of disintegration, as in this manuscript of Pollard.
Also, because of a keen interest in the application of Monte Carlo methods, Bayesian statisticians have to a lot of work on various aspects of computational methods for sampling from various distributions. Christian Robert is a prominent researcher in this area, and he has a blog. The current post happens to be about Bayesian foundations.
Finally, at the heart of a many arguments in favor of a Bayesian approach (early chapters in Bernardo and Smith and Robert are dedicated to it) are de Finetti type representation theorems, which sanction prior distributions via appeals to exchangeability. You can start with the wiki entry for de Finetti theorems and then look at the work of Persi Diaconis on the topic. In this vein see also Lauritzen's monograph, which (for me anyway) is the last word on the matter.
Best Answer
There are many approaches to this problem. Here are three.
The subjective Bayes approach says the prior should simply quantify what is known or believed before the experiment takes place. Period. End of discussion.
The empirical Bayes approach says you can estimate your prior from the data itself. (In that case your "prior" isn't prior at all.)
The objective Bayes approach says to pick priors based on mathematical properties, such as "reference" priors that in some sense maximize information gain. Jim Berger gives a good defense of objective Bayes here.
In practice someone may use any and all of these approaches, even within the same model. For example, they may use a subjective prior on parameters where there is a considerable amount of prior knowledge and use a reference prior on other parameters that are less important or less understood.
Often it simply doesn't matter much what prior you use. For example, you might show that a variety of priors, say an optimistic prior and a pessimistic prior, lead to essentially the same conclusion. This is particularly the case when there's a lot of data: the impact of the prior fades as data accrue. But for other applications, such as hypothesis testing, priors matter more.