Bayesian Statistics – Why Uniform Prior on log(x) is Equal to 1/x Prior on x

bayesianjeffreys-priorpriorpythonuninformative-prior

I'm trying to understand Jeffreys prior. One application is for 'scale' variables like the standard deviation $\sigma$ (or its square, the variance $\sigma^2$) of Gaussian distributions. It is often said that using a uniform prior over $\sigma$ is not really non-informative and instead one should either:

  1. Use instead $\ln \sigma$ as the free parameter, with a uniform prior (this is often called a log-uniform prior)

  2. Or keep using $\sigma$ as the free parameter but use $1/\sigma$ as the prior (which is not uniform).

Why are the above two methods/priors equivalent? I feel it has something to do with the fact that the derivative of ln $\sigma$ is $1/\sigma$ but I can't take the next step.

Also, why does this even matter, in simple language with minimal jargon? I see all these complicated explanations online involving the Fisher information matrix but in the end all I see is that the above log-uniform or $1/\sigma$ priors preferentially weight lower values of $\sigma$ more highly. Why? If possible, a simple analytic example or python snippet would be very helpful.

Best Answer

When transforming a uniform distribution on $\log(\sigma)$ to a distribution on $\sigma$ you need to take into account the Jacobian of the transformation. This corresponds, as you correctly intuited, to $1/\sigma$.

Writing this a little more clearly, let $X=\log(\sigma)$ and the transformation we're after is $T(X)=\sigma=e^{X}=Y$, which has inverse transformation $T^{-1}(Y)=\log(Y)$. The jacobian is then $|\frac{\partial X}{\partial Y}|=1/Y$. So since $p(X)\propto 1$, we have that the induced density for $\sigma$ is the $p(Y)=|\frac{\partial X}{\partial Y}|p(\log(Y))\propto1/Y$.