Bayesian – Circumstances for Using Improper Prior in Bayesian Analysis for Posterior Insights

bayesianposteriorprior

I am attempting to gain some intuition about the use of priors in bayesian analysis.
I have read in some instances that an improper prior can be used when no information is known. However here is my first confusion, how exactly is an improper prior defined, I have seen some definitions that remark that it is a distribution that does not integrate to 1 and others affirm that it is a distribution that integrates to infinity, which one is it? Further how exactly does the prior distribution relate to the posterior, in terms of the attributes that the posterior must have when we use an improper prior. Is it the case that the posterior must be able to be integrated when we used an improper prior? must it be proper or improper? Further can an improper prior be used for valid bayesian inference?

Best Answer

The textbook definition is that an improper prior is a prior probability density that is not integrable, i.e. it does not have a finite integral. A CDF cannot be higher than 1 hence the name "improper". If a prior were integrable to a finite number, one could always divide the prior density to said number making it integrable to 1. Thus, it must be non-finite.

Using an improper prior may lead to an improper posterior, but it does not need to. For example, choosing the uniform prior over $\mathbb{R}$ as $\pi(\theta)=1 \ \forall \ \theta\in\mathbb{R}$ leads to it vanishing (multiplication by 1) in the posterior, thus integration of the likelihood times prior to get the norming constant $f(y)=\int_{-\infty}^{+\infty}f(y|\theta)\pi(\theta)d\theta$ is just integration over the likelihood. Even if this is not the case, for Bayesian simulation we only need the kernel of the posterior, i.e. the part that is proportional to the posterior, not the norming constant, i.e. $\pi(\theta|y) \propto f(y|\theta)\pi(\theta)$. Thus Integration to 1 is not needed for the posterior. The kernel is enough to simulate the posterior, which is most likely the case with Bayesian problems. Of course, if we 'know' the posterior, all inference regarding it is valid*, but some issues arise (see below).

*[EDIT: I've given this some thought and clearly you would not be able to calculate expectations, variances, credibility intervals in some cases of improper posteriors. I found this as an example of a simulation from such a posterior. There are obviously problems with them.]


[Extra content: Interest arises not whether or not improper priors can be used, but when looking at the issues that arise when using improper priors such as 1) the possibility of manipulation of Bayes factors and 2) marginalization paradoxes. Improper priors should best be avoided. Improper priors are used to model uninformativeness, i.e. the fact that we have no prior knowledge of the data. There are many proposals regarding uninformative (diffuse) priors, which need not be improper. To name a few prior proposals: Jeffrey's prior, Haldane's prior, Maximum Entropy prior, Maximal data information prior, empirical Bayes.]