When I have single random variable given by its probability density function $f_X$ I can compute the quantile function $Q_X$ (inverse CDF) and sample like $Q_X(U)$ where $U$ is uniform from $[0, 1]$ (by inverse transform sampling).
If I have joint density function $f_{X, Y}$ I could compute marginal density of $X$, sample $X$ like before, compute conditional density of $Y$ given $X = x$ and again sample $Y$.
But the above seems like conditioning on an event with probability zero and there is the Borel-Kolmogorov paradox (nice talk: Snow Xueyin Zhang – "The Borel-Kolmogorov Paradox as a Paradox for Physical Chance").
- Under what assumptions am I allowed to sample using marginal distribution? Are there any considerations when sampling using a computer?
- What is the meaning of conditional probability density function $f_{X \mid Y}$? Wikipedia states that it need not to be invariant under coordinate transformations. Is it really conditioned on an event with probability zero?
Best Answer
I do not see the Borel-Kolmogorov paradox applying here, if you can find the inverse of the marginal CDF and the inverse of the conditional CDF.
Let's take an example where we know the Borel-Kolmogorov paradox occurs
We sample for $Y$ and $Z$ where the joint density is $f_{Y,Z}(y,z) = \dfrac{ye^{-y}}{(z+1)^2}\mathbf{1}_{\{y\geq 0, z \geq 0\}}$. We have $f_{Z}(z) = \dfrac{1}{(z+1)^2}\mathbf{1}_{\{z \geq 0\}}$ and $f_{Y \mid Z}(y \mid z) = ye^{-y}\mathbf{1}_{\{y\geq 0\}}$ - there is a slight issue with using the inverse CDF of a gamma distribution, but computers can solve that. We are interested in $X_1 =\frac{YZ}{Z+1}$ and $X_2 =\frac{Y}{Z+1}$
We sample for $U$ and $V$ where the joint density is $f_{U,V}(u,v) = \frac{1}{2}e^{-u}\mathbf{1}_{\{u\geq |v|\}}$ . We have $f_{V}(v) = \frac{1}{2}e^{-|v|}$ and $f_{U \mid V}(u \mid v) = e^{|v|-u}\mathbf{1}_{\{u\geq |v|\}}$. We are interested in $X_3 =\frac12(U+V)$ and $X_4 =\frac12(U-V)$
It turns out that following your algorithm $(X_1,X_2)$ and $(X_3,X_4)$ have the same joint distributions involving exponential distributions with rate $1$
The Borel-Kolmogorov paradox is that $f_{X_1 \mid Z=1}(x) = 4x \exp(-2x)$, a gamma distribution, while $f_{X_3 \mid V=0}(x) = 2\exp(-2x)$, an exponential distribution with half the mean of the gamma distribution, so not the same even though $Z=1 \iff X_1=X_2$ and $V=0 \iff X_3=X_4$, but this does not affect your algorithm.