I am trying to learn the basics of Bayesian decision and I came across the phrase "proper prior" but I don't really understand what it means. Does anyone know?
Solved – Meaning of proper prior
prior
Related Solutions
In my understanding the probability density function is explicitly given for an explicit prior. If this is not possible, the prior can be still implicitly defined which I would then call an implicit prior. Consider estimation of a parameter $\theta$. Assume that because of some circumstances the prior distribution is not specified for $\theta$ directly but for a functional transform of $\theta$, i.e. $f(\theta)$ follows distribution $X$. In order to get back to the distribution of $\theta$ it would be necessary to find the inverse transform $f^{-1}$. Since this is sometimes computationally infeasible, it is not done. Yet, one can still use this implicitly defined distribution for $\theta$ to do i.e. parameter estimation.
We generally accept posteriors from improper priors $\pi(\theta)$ if $$ \frac{\pi(X \mid \theta) \pi(\theta)}{\pi(X)} $$ exists and is a valid probability distribution (i.e., it integrates exactly to 1 over the support). Essentially this boils down to $\pi(X) = \int \pi(X \mid \theta) \pi(\theta) \,d\theta$ being finite. If this is the case, then we call this quantity $\pi(\theta \mid X)$ and accept it as the posterior distribution that we want. However, it is important to note that this is NOT a posterior distribution, nor is it a conditional probability distribution (these two terms are synonymous in the context here).
Now, I said we accept 'posterior' distributions from improper priors given the above. The reason they are accepted is because the prior $\pi(\theta)$ will still give us relative 'scores' on the parameter space; i.e., the ratio $\frac{\pi(\theta_1)}{\pi(\theta_2)}$ brings meaning to our analysis. The meaning we get from improper priors in some cases may not be available in proper priors. This is a potential justification for using them. See Sergio's answer for a more thorough examination of the practical motivation for improper priors.
It's worth noting that this quantity $\pi(\theta \mid X)$ does have desirable theoretical properties as well, Degroot & Schervish:
Improper priors are not true probability distributions, but if we pretend that they are, we will compute posterior distributions that approximate the posteriors that we would have obtained using proper conjugate priors with extreme values of the prior hyperparameters.
Best Answer
A prior distribution that integrates to 1 is a proper prior, by contrast with an improper prior which doesn't.
For example, consider estimation of the mean, $\mu$ in a normal distribution. the following two prior distributions:
$\qquad f(\mu) = N(\mu_0,\tau^2)\,,\: -\infty<\mu<\infty$
$\qquad f(\mu) \propto c\,,\qquad\qquad -\infty<\mu<\infty.$
The first is a proper density. The second is not - no choice of $c$ can yield a density that integrates to $1$. Nevertheless, both lead to proper posterior distributions.
See the following posts which throw additional light on the use of improper priors issue and some closely related issues:
Flat, conjugate, and hyper- priors. What are they?
What is an "uninformative prior"? Can we ever have one with truly no information?