Improper Prior – How Can an Improper Prior Lead to a Proper Posterior Distribution

bayesiandistributionsposteriorprior

We know that in the case of a proper prior distribution,

$P(\theta \mid X) = \dfrac{P(X \mid \theta)P(\theta)}{P(X)}$

$ \propto P(X \mid \theta)P(\theta)$.

The usual justification for this step is that the marginal distribution of $X$, $P(X)$, is constant with respect to $\theta$ and can thus be ignored when deriving the posterior distribution.

However, in the case of an improper prior, how do you know that the posterior distribution actually exists? There seems to be something missing in this seemingly circular argument. In other words, if I assume the posterior exists, I understand the mechanics of how to derive the posterior, but I seem to be missing the theoretical justification for why it even exists.

P.S. I also recognize that there are cases in which an improper prior leads to an improper posterior.

Best Answer

We generally accept posteriors from improper priors $\pi(\theta)$ if $$ \frac{\pi(X \mid \theta) \pi(\theta)}{\pi(X)} $$ exists and is a valid probability distribution (i.e., it integrates exactly to 1 over the support). Essentially this boils down to $\pi(X) = \int \pi(X \mid \theta) \pi(\theta) \,d\theta$ being finite. If this is the case, then we call this quantity $\pi(\theta \mid X)$ and accept it as the posterior distribution that we want. However, it is important to note that this is NOT a posterior distribution, nor is it a conditional probability distribution (these two terms are synonymous in the context here).

Now, I said we accept 'posterior' distributions from improper priors given the above. The reason they are accepted is because the prior $\pi(\theta)$ will still give us relative 'scores' on the parameter space; i.e., the ratio $\frac{\pi(\theta_1)}{\pi(\theta_2)}$ brings meaning to our analysis. The meaning we get from improper priors in some cases may not be available in proper priors. This is a potential justification for using them. See Sergio's answer for a more thorough examination of the practical motivation for improper priors.

It's worth noting that this quantity $\pi(\theta \mid X)$ does have desirable theoretical properties as well, Degroot & Schervish:

Improper priors are not true probability distributions, but if we pretend that they are, we will compute posterior distributions that approximate the posteriors that we would have obtained using proper conjugate priors with extreme values of the prior hyperparameters.

Related Question