Confusion about joint probability distribution in Bayesian Inference setup

bayesianjoint distributionprobability

I am confused by a simple fact but I can't solve my head around it!

It's known that:
$P(y,\theta) = P(y|\theta)*P(\theta)$
and
$P(y,\theta) = P(\theta|y)*P(y)$

Giving us: $P(\theta|y)*P(y) = P(y|\theta)*P(\theta)$ (both valid probability density functions)

But in bayesian inference literature we learn that $P(y|\theta)*P(\theta)$ is not a probability density function because $P(y|\theta)*P(\theta)$ when $P(y|\theta)$ as a likelihood does not sum to 1, needing the $P(y)$ as a denominator to normalize it and give a valid posterior probability density function.

So, my confusion is: is the joint density function NOT a joint PROBABILITY density function in $P(y,\theta) = P(y|\theta)*P(\theta)$ since the right hand side does not sum to 1?

Best Answer

Both \begin{equation} p(\theta|y)\,p(y) \qquad\text{and}\qquad p(y|\theta)\,p(\theta) \end{equation} are valid joint probability distributions for $(y,\theta)$. Both distributions integrate to 1 with respect to $y$ and $\theta$: \begin{equation} \iint p(\theta|y)\,p(y)\,dy\,d\theta = 1 \qquad\text{and}\qquad \iint p(y|\theta)\,p(\theta) \,dy\,d\theta = 1 . \end{equation}

But $p(y|\theta)\,p(\theta)$ is not (in general) a valid conditional distribution for $\theta$ because it does not (necessarily) integrate to 1 with respect to $\theta$: \begin{equation} \int p(y|\theta)\,p(\theta)\,d\theta = p(y) , \end{equation} where $p(y)$ does not necessarily equal 1.

Addendum

I'm addressing a question in a comment by the OP.

The marginal distribution for $y$ is computed from the joint distribution by integrating out $\theta$: \begin{equation} p(y) = \int p(y|\theta)\,p(\theta)\,d\theta . \end{equation} For fixed $y$, $p(y)$ is a number, the density at $y$. But the density $p(y)$ is allowed to vary as we vary $y$. In \begin{equation} \int p(y)\,dy = 1 \end{equation} $p(y)$ varies as $y$ varies so as to guarantee the result.

In the conditional distribution \begin{equation} p(\theta|y) = \frac{p(y|\theta)\,p(\theta)}{p(y)} , \end{equation} we are holding $y$ fixed and so $p(y)$ is a number. If we chose to fix $y$ at a different value, then we get a different distribution for $\theta$ and a different value for $p(y)$.

(This can be a bit tricky at first and it's good to nail it down before proceeding.)

Related Question