Solved – Laplace’s law of succession using different priors

bayesianprioruninformative-prior

Laplace's law of succession is a well-known rule, relying on Bayes' theorem.

A possible proof of the rule of succession can be found on Wikipedia.
Note that for this proof we use a uniform distribution for the parameter $p$.

Another proof of the rule is given in The Bayesian Choice as reproduced below:

Laplace succession rule

The problem is completely summarized in the image. This time, the prior we use is a uniform discrete probability distribution.

And we find both times the same final probability.

However, we did not used the uninformative prior each time. The uninformative prior for a discrete and finite set of possibilities is the uniform distribution. But the uninformative prior for the parameter $p$ in the first proof should be $1/[p(1-p)]$?

My problem is that if we use the uninformative prior in both cases (so this should only be two different formulations of the same problem ?), we find two different answers.

I am surely mistaken about the meaning of one approach, could you please give me some clue?

Best Answer

My opinion on this issue is that you are comparing the answers to two different problems, namely the Bayesian inference on the probability of "yet another sunrise" in the hypergeometric distribution and the Bayesian inference on the probability of "yet another sunrise" in the Bernoulli distribution.

There is no reason for the two answers to be equal for the same observed data.

First, given that the models are not equivalent (Bernoulli sampling cannot be turned into hypergeometric sampling), there is no principle that states that the answers should be the same. For instance, the likelihood principle does not apply there.

Second, there is no such thing as "the" non-informative or uninformative or objective prior. I discussed this in an earlier X validated answer. (Which turned out to be my most popular answer to date!) There are several coherent principles that lead to the generic construction of a reference prior, such as Jeffreys' rule, the invariance principle, the maximum entropy utility, Berger's & Bernardo's reference priors.

Third, there is a fundamental ambiguity in the definition of the maximum entropy priors in continuous settings, namely that they depend on the choice of the dominating measure. Changing the measure does change the value of the maximum entropy prior and choosing the dominating measure requires the call to yet another principle. I believe this is discussed in the Bayesian Choice to some extent.

Related Question