Radon–Nikodym Derivative and Bayes’ Theorem

measure-theoryprobabilityprobability theory

Theorem 1.3.1. (Bayes' theorem):
Suppose that $X$ has a parametric family $\mathcal{P}_0$ of distributions with parameter space $\Omega$.
Suppose that $P_\theta \ll \nu$ for all $\theta \in \Omega$, and let $f_{X\mid\Theta}(x\mid\theta)$ be the conditional density (with respect to $\nu$) of $X$ given $\Theta = \theta$.
Let $\mu_\Theta$ be the prior distribution of $\Theta$.
Let $\mu_{\Theta\mid X}(\cdot \mid x)$ denote the conditional distribution of $\Theta$ given $X = x$.
Then $\mu_{\Theta\mid X} \ll \mu_\Theta$, a.s. with respect to the marginal of $X$, and the Radon–Nikodym derivative is
$$
\frac{\mathrm d\mu_{\Theta\mid X}}{\mathrm d\mu_\Theta}(\theta \mid x)
= \frac{f_{X\mid \Theta}(x\mid \theta)}{\int_\Omega f_{X\mid\Theta}(x\mid t) \, \mathrm d\mu_\Theta(t)}
$$

for those $x$ such that the denominator is neither $0$ nor infinite.
The prior predictive probability of the set of $x$ values such that the denominator is $0$ or infinite is $0$, hence the posterior can be defined arbitrarily for such $x$ values.

I tried to derive the right hand side of the Radon–Nikodym derivative above but I got different result, here is my attempt:

\begin{equation} \label{eq1}
\begin{split}
\frac{\mathrm d\mu_{\Theta\mid X}}{\mathrm d\mu_\Theta}(\theta \mid x) &= f_{\Theta\mid X}(\theta\mid x) \mathrm \space \space \space[1]\\
&=\frac{f_{X\mid \Theta}(x\mid \theta) \cdot f_{\Theta}(\theta)}{f_X(x)}\\
&=\frac{f_{X\mid \Theta}(x\mid \theta) \cdot f_{\Theta}(\theta)}{\int_\Omega f_{X\mid\Theta}(x\mid t) \, \cdot f_{\Theta}(t) \space \mathrm dt}\\
&=\frac{f_{X\mid \Theta}(x\mid \theta) \cdot f_{\Theta}(\theta)}{\int_\Omega f_{X\mid\Theta}(x\mid t) \, \mathrm d\mu_\Theta(t)}
\end{split}
\end{equation}

but now, where does $f_{\Theta}(\theta)$ go?

for $[1]$ see slide $10$ of the following document: http://mlg.eng.cam.ac.uk/mlss09/mlss_slides/Orbanz_1.pdf

Thanks in advance.

Best Answer

You wrote: $$ \frac{\mathrm d\mu_{\Theta\mid X}}{\mathrm d\mu_\Theta}(\theta \mid x) = \frac{f_{X\mid \Theta}(x\mid \theta)}{\int_\Omega f_{X\mid\Theta}(x\mid t) \, \mathrm d\mu_\Theta(t)} $$ Let's rearrange it a little bit: $$ \mathrm d\mu_{\Theta\mid X} (\theta \mid x) = \frac{f_{X\mid \Theta}(x\mid \theta) \, \mathrm d\mu_\Theta}{\int_\Omega f_{X\mid\Theta}(x\mid t) \, \mathrm d\mu_\Theta(t)} $$ and then: $$ \frac{\mathrm d\mu_{\Theta\mid X}}{d\nu} (\theta \mid x) = \frac{f_{X\mid \Theta}(x\mid \theta) \, (\mathrm d\mu_\Theta/d\nu)(\theta)}{\int_\Omega f_{X\mid\Theta}(x\mid t) \, \mathrm d\mu_\Theta(t)} $$ $${}$$ $$ \frac{d\mu_{\Theta\,\mid\, X=x}}{d\nu}(\theta) = \frac{ \displaystyle \frac{d\mu_{X\,\mid\,\Theta=t}}{d\lambda}(x) \cdot \frac{d\mu_\Theta}{d\nu}(\theta) }{ \displaystyle \int \frac{d\mu_{X\,\mid\,\Theta=t} (x)}{d\lambda} \cdot d\mu_\Theta(t) } $$

Related Question