I am reading a paper where they talk about keeping a prior explicit as opposed to an implicit prior. To be honest, I have never came across the terms explicit/implicit in context of priors and I was wondering if these are technical terms associated with prior distributions. Does an explicit prior simply means when the prior distribution is specified? How does one get an implicit prior from the problem specification?
Solved – Explicit prior versus implicit prior
bayesiandistributionspriorprobability
Related Solutions
In the context of computational problems, including numerical methods for Bayesian inference, the phrase "too expensive" generally could refer to two issues
- a particular problem is too "large" to compute for a particular "budget"
- a general approach scales badly, i.e. has high computational complexity
For either case, the computational resources comprising the "budget" may consist of things like CPU cycles (time complexity), memory (space complexity), or communication bandwidth (within or between compute nodes). In the second instance, "too expensive" would mean intractable.
In the context of Bayesian computation, the quote is likely referring to issues with marginalization over a large number of variables.
For example, the abstract of this recent paper begins
Integration is affected by the curse of dimensionality and quickly becomes intractable as the dimensionality of the problem grows.
and goes on to say
We propose a randomized algorithm that ... can in turn be used, for instance, for marginal computation or model selection.
(For comparison, this recent book chapter discusses methods considered "not too expensive".)
It is often easier to reason about one random quantity at a time than work with all random quantities simultaneously. In Bayesian statistics, where everything is a random quantity, this is especially true. You often have to fix one random quantity to work with another. In more technical terms, it is often easier to work with conditional distributions than joint distributions.
You can think of conditional probability as tool to set a random quantity to a particular, fixed value. So rather than thinking about $f(r | t, \theta)$ as prior knowledge of $r$ depending on $t$ and $\theta$, imagine this conditional distribution as a way to reason about $r$ without the interference of random fluctuations in $t$ and $\theta$. You set $t$ and $\theta$ to specific values $t^*$ and $\theta^*$ while allowing $r$ to vary: $f(r | t = t^*, \theta = \theta^*)$
As @Peter Pang notes, you can factor the joint prior distribution differently than in your original post: $$ \begin{aligned} p(t, r, \theta) &= p(r \vert \theta, t)p(\theta \vert t)p(t)\\ &= p(\theta \vert r, t)p(r \vert t)p(t)\\ &= p(\theta \vert r, t)p(t \vert r)p(r)\\ &=\vdots \end{aligned} $$ Depending on the specific problem you're working on, it may be simpler (conceptually, mathematically, or numerically) to work with the distribution of, say, $f(t | r, \theta)$ instead of $f(r | t, \theta)$. Since the joint prior can be factor differently, it is always your option choose what quantities, if any, are fixed in place (i.e. conditioned on) at each step in the factorization.
Best Answer
In my understanding the probability density function is explicitly given for an explicit prior. If this is not possible, the prior can be still implicitly defined which I would then call an implicit prior. Consider estimation of a parameter $\theta$. Assume that because of some circumstances the prior distribution is not specified for $\theta$ directly but for a functional transform of $\theta$, i.e. $f(\theta)$ follows distribution $X$. In order to get back to the distribution of $\theta$ it would be necessary to find the inverse transform $f^{-1}$. Since this is sometimes computationally infeasible, it is not done. Yet, one can still use this implicitly defined distribution for $\theta$ to do i.e. parameter estimation.