Bayesian Statistics – Why Uniform Prior on log(x) is Equal to 1/x Prior on x

bayesianjeffreys-priorpriorpythonuninformative-prior

I'm trying to understand Jeffreys prior. One application is for 'scale' variables like the standard deviation $\sigma$ (or its square, the variance $\sigma^2$) of Gaussian distributions. It is often said that using a uniform prior over $\sigma$ is not really non-informative and instead one should either:

Use instead $\ln \sigma$ as the free parameter, with a uniform prior (this is often called a log-uniform prior)
Or keep using $\sigma$ as the free parameter but use $1/\sigma$ as the prior (which is not uniform).

Why are the above two methods/priors equivalent? I feel it has something to do with the fact that the derivative of ln $\sigma$ is $1/\sigma$ but I can't take the next step.

Also, why does this even matter, in simple language with minimal jargon? I see all these complicated explanations online involving the Fisher information matrix but in the end all I see is that the above log-uniform or $1/\sigma$ priors preferentially weight lower values of $\sigma$ more highly. Why? If possible, a simple analytic example or python snippet would be very helpful.

Best Answer

When transforming a uniform distribution on $\log(\sigma)$ to a distribution on $\sigma$ you need to take into account the Jacobian of the transformation. This corresponds, as you correctly intuited, to $1/\sigma$.

Writing this a little more clearly, let $X=\log(\sigma)$ and the transformation we're after is $T(X)=\sigma=e^{X}=Y$, which has inverse transformation $T^{-1}(Y)=\log(Y)$. The jacobian is then $|\frac{\partial X}{\partial Y}|=1/Y$. So since $p(X)\propto 1$, we have that the induced density for $\sigma$ is the $p(Y)=|\frac{\partial X}{\partial Y}|p(\log(Y))\propto1/Y$.

Related Solutions

Solved – How do we define log-normal prior and a multivariate posterior log-likelihood in PyMC

Your @stochastic uses are not correct. Notice that your functions don't return anything. When using the decorator, you're supposed to return value of the logp. See here for example usage.

If you're going to use @stochastic I think you probably want something like this for each of your @stochastic uses.

@pm.stochastic(dtype=np.float, observed=False, trace=True)
def Xpos(value=1900,x_l=1800,x_h=1950):
    """The probable region of the position of halo centre"""

    if ((value>x_h) or (value<x_l)):
       return -np.inf
    else:
       return -np.log(x_h-x_l+1)

(directly returning the logp)

If you need to provide a random() function (I suspect you don't), I think you can pass it to stochastic.

However, since you just want a uniform prior for Xpos and Ypos, you can just use Uniform instead.

Xpos = Uniform("Xpos", 1800, 1950)

Your concentration stochastic seems very complicated given that C is pretty straightforward. I would expect it to directly follow your definition of C.

g_hat should definitely be a @deterministic, since you say its a function. If so, it shouldn't have a likelihood of its own.

For concentration, you want something like

@deterministic
def sigma(value = 1, M=M): 
   if M > 10**15:
       return .09
   else:
       return .06

cExpected = const/(1+z)*M**-.1
concentration = Lognormal("concentration", cExpected, sigma)

Solved – When does the maximum likelihood correspond to a reference prior

Correct, as long as the support of the uniform prior contains the MLE. The reason for this is that the posterior and the likelihood are proportional on the support of the uniform prior. Even if the MAP and MLE coincide numerically, their interpretation is completely different.
False. The support of the prior is certainly location and scale dependent (e.g. if the data are reported in nanometers or in parsecs), but an appropriate choice is often possible. You may need to use a huge compact set as the support, but it is still possible.
It does not use prior information in the sense of a prior distribution (since they are completely different inferential approaches) but there is always information injected by the user. The choice of the model is a form of prior information. If you put 10 people to fit a dataset, some of them would probably come up with different answers.
Yes. Have a look at the following references

The formal definition of reference priors

Jeffreys Priors and Reference Priors

The reference prior and the Jeffreys prior are the same in uniparametric models (unidimensional parameter), but this is not the case in general. They are uniform for location parameters, but this is not the case of scale and shape parameters. They are different even for the scale parameter of the normal distribution (see my previous references).
False. Truly Bayesians use the posterior distribution in order to obtain Bayes estimators. The MAP is one of them, but there are many others. See Wikipedia's article on the Bayes estimator.

Non-Bayesians do not always use the MLE. An example of this is the James-Stein estimator, which is based on a different criterion than maximizing a likelihood function.

Best Answer

Related Solutions

Solved – How do we define log-normal prior and a multivariate posterior log-likelihood in PyMC

Solved – When does the maximum likelihood correspond to a reference prior

Related Question