Bayesian Prior – Prior on Precision or Variance in Bayesian Analysis

bayesianprior

I've been reading this tutorial on variational bayes which talks about sparse Bayesian learning (Relevance Vector Machines if you prefer). In the paper they put a Gamma prior on the precision parameter. In this instance $Ga(\theta|\delta,\delta)$, where $\delta$ is small so that $Ga(\theta|\delta,\delta)\approx\frac{1}{\theta}.$

My question is why or when would you put a gamma prior on precision and not variance. I understand that in the case of Sparse Bayesian learning, it was useful to put this prior on the coefficient/ weight precision since it effectively gave you a sparse prior on the weights.

However, for general regression problems it would make sense to think that smaller variances are preferred, thus we should put a $Ga(\theta|\delta,\delta)$ on the variance. Putting this prior on the precision would imply smaller precisions are preferred. Or subsequently am I interpreting the prior wrong?

Best Answer

I do not know your reference, but what I can tell you is that the gamma distribution is the conjugate prior for precision of a normal distribution. Moreover when $\delta_1 \rightarrow 0$ and $\delta_2 \rightarrow 0$ (using the shape, rate parametrisation that seems to be used in our paper but it is difficult to be sure), it approximates the Jeffreys prior (which in one dimension is also the reference prior) $p(\sigma) \propto \frac{1}{\sigma} 1_{[0,\infty[}(\sigma)$ (after reparametrisation). This makes it widely used as a non informative prior. Nevertheless this choice is critized due to a potential strong dependence on the chosen value of $\delta$ and other choices has been proposed more recently e.g. half-cauchy distribution for $\sigma$ (http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf)

Related Question