Solved – Jeffreys prior for binomial likelihood

bayesianjeffreys-prior

If I use a Jeffreys prior for a binomial probability parameter $\theta$ then this implies using a $\theta \sim beta(1/2,1/2)$ distribution.

If I transform to a new frame of reference $\phi = \theta^2$ then clearly $\phi$ is not also distributed as a $beta(1/2,1/2)$ distribution.

My question is in what sense is Jeffreys prior invariant to reparameterisations? I think I am misunderstanding the topic to be honest …

Best Answer

Lets have $\phi = g(\theta)$, where $g$ is a monotone function of $\theta$ and let $h$ be the inverse of $g$, so that $\theta = h(\phi)$. We can obtain Jeffrey's prior distribution $p_{J}(\phi)$ in two ways:

  1. Start with the Binomial model (1) \begin{equation} \label{original} p(y | \theta) = \binom{n}{y} \theta^{y} (1-\theta)^{n-y} \end{equation} reparameterize the model with $\phi = g(\theta)$ to get $$ p(y | \phi) = \binom{n}{y} h(\phi)^{y} (1-h(\phi))^{n-y} $$ and obtain Jeffrey's prior distribution $p_{J}(\phi)$ for this model.
  2. Obtain Jeffrey's prior distribution $p_{J}(\theta)$ from original Binomial model 1 and apply the change of variables formula to obtain the induced prior density on $\phi$ $$ p_{J}(\phi) = p_{J}(h(\phi)) |\frac{dh}{d\phi}|. $$

To be invariant to reparameterisations means that densities $p_{J}(\phi)$ derived in both ways should be the same. Jeffrey's prior has this characteristic [Reference: A First Course in Bayesian Statistical Methods by P. Hoff.]

To answer your comment. To obtain Jeffrey's prior distribution $p_{J}(\theta)$ from the likelihood for Binomial model $$ p(y | \theta) = \binom{n}{y} \theta^{y} (1-\theta)^{n-y} $$ we must calculate Fisher information by taking logarithm of likelihood $l$ and calculate second derivative of $l$ \begin{align*} l := \log(p(y | \theta)) &\propto y \log(\theta) + (n-y) \log(1-\theta) \\ \frac{\partial l }{\partial \theta} &= \frac{y}{\theta} - \frac{n-y}{1-\theta} \\ \frac{\partial^{2} l }{\partial \theta^{2}} &= -\frac{y}{\theta^{2}} - \frac{n-y}{ (1-\theta)^{2} } \end{align*} and Fisher information is \begin{align*} I(\theta) &= -E(\frac{\partial^{2} l }{\partial \theta^{2}} | \theta) \\ &= \frac{n\theta}{\theta^{2}} + \frac{n - n \theta}{(1-\theta)^{2}} \\ &= \frac{n}{\theta ( 1- \theta)} \\ &\propto \theta^{-1} (1-\theta)^{-1}. \end{align*} Jeffrey's prior for this model is \begin{align*} p_{J}(\theta) &= \sqrt{I(\theta)} \\ &\propto \theta^{-1/2} (1-\theta)^{-1/2} \end{align*} which is $\texttt{beta}(1/2, 1/2)$.