Bayesian Statistics – Example of a Prior That Leads to a Non-Invariant Posterior Unlike Jeffreys Prior

bayesianfisher informationinvariancejeffreys-priormathematical-statistics

I am reposting an "answer" to a question that I had given some two weeks ago here: Why is the Jeffreys prior useful? It really was a question (and I did not have the right to post comments at the time, either), though, so I hope it is OK to do this:

In the link above it is discussed that the interesting feature of Jeffreys prior is that, when reparameterizing the model, the resulting posterior distribution gives posterior probabilities that obey the restrictions imposed by the transformation. Say, as discussed there, when moving from the success probability $\theta$ in the Beta-Bernoulli example to odds $\psi=\theta/(1-\theta)$, it should be the case that the a posterior satisfies $P(1/3\leq\theta\leq 2/3\mid X=x)=P(1/2\leq\psi\leq 2\mid X=x)$.

I wanted to create a numerical example of invariance of Jeffreys prior for transforming $\theta$ to odds $\psi$, and, more interestingly, lack thereof of other priors (say, Haldane, uniform, or arbitrary ones).

Now, if the posterior for the success probability is Beta (for any Beta prior, not only Jeffreys), the posterior of the odds follows a Beta distribution of the second kind (see Wikipedia) with the same parameters. Then, as highlighted in the numerical example below, it is not too surprising (to me, at least) that there is invariance for any choice of Beta prior (play around with alpha0_U and beta0_U), not only Jeffreys, cf. the output of the program.

library(GB2) 
# has the Beta density of the 2nd kind, the distribution of theta/(1-theta) if theta~Beta(alpha,beta)

theta_1 = 2/3 # a numerical example as in the above post
theta_2 = 1/3

odds_1 = theta_1/(1-theta_1) # the corresponding odds
odds_2 = theta_2/(1-theta_2)

n = 10 # some data
k = 4

alpha0_J = 1/2 # Jeffreys prior for the Beta-Bernoulli case
beta0_J = 1/2
alpha1_J = alpha0_J + k # the corresponding parameters of the posterior
beta1_J = beta0_J + n - k

alpha0_U = 0 # some other prior
beta0_U = 0
alpha1_U = alpha0_U + k # resulting posterior parameters for the other prior
beta1_U = beta0_U + n - k

# posterior probability that theta is between theta_1 and theta_2:
pbeta(theta_1,alpha1_J,beta1_J) - pbeta(theta_2,alpha1_J,beta1_J) 
# the same for the corresponding odds, based on the beta distribution of the second kind
pgb2(odds_1, 1, 1,alpha1_J,beta1_J) - pgb2(odds_2, 1, 1,alpha1_J,beta1_J) 

# same for the other prior and resulting posterior
pbeta(theta_1,alpha1_U,beta1_U) - pbeta(theta_2,alpha1_U,beta1_U)
pgb2(odds_1, 1, 1,alpha1_U,beta1_U) - pgb2(odds_2, 1, 1,alpha1_U,beta1_U)

This brings me to the following questions:

  1. Do I make a mistake?
  2. If no, is there a result like there being no lack of invariance in conjugate families, or something like that? (Quick inspection leads me to suspect that I could for instance also not produce lack of invariance in the normal-normal case.)
  3. Do you know a (preferably simple) example in which we do get lack of invariance?

Best Answer

Your computation seems to be verifying that, when we have a particular prior distribution $p(\theta)$ the following two procedures

  1. Compute the posterior $p_{\theta \mid D}(\theta \mid D)$
  2. Transform the aforementioned posterior into the other parametrization to obtain $p_{\psi \mid D}(\psi \mid D)$

and

  1. Transform the prior $p_\theta(\theta)$ into the other parametrization to obtain $p_\psi(\psi)$
  2. Using the prior $p_\psi(\psi)$, compute the posterior $p_{\psi \mid D}(\psi \mid D)$

lead to the same posterior for $\psi$. This will indeed always occur (caveat; as long as the transformation is such that a distribution over $\psi$ is determined by a distribution over $\theta$).

However, this is not the point of the invariance in question. Instead, the question is whether, when we have a particular Method For Deciding The Prior, the following two procedures:

  1. Use the Method For Deciding The Prior to decide $p_\theta(\theta)$
  2. Convert that distribution into $p_\psi(\psi)$

and

  1. Use the Method For Deciding The Prior to decide $p_\psi(\psi)$

result in the same prior distribution for $\psi$. If they result in the same prior, they will indeed result in the same posterior, too (as you have verified for a couple of cases).

As mentioned in @NeilG's answer, if your Method For Deciding The Prior is 'set uniform prior for the parameter', you will not get the same prior in the probability/odds case, as the uniform prior for $\theta$ over $[0,1]$ is not uniform for $\psi$ over $[0,\infty)$.

Instead, if your Method For Deciding The Prior is 'use Jeffrey's prior for the parameter', it will not matter whether you use it for $\theta$ and convert into the $\psi$-parametrization, or use it for $\psi$ directly. This is the claimed invariance.