Solved – Ratio of probabilities vs ratio of PDFs

bayesiankernel-smoothingmaximum likelihoodprobability

I'm using Bayes to solve a clustering problem. After doing some calculations I end up with the need to obtain the ratio of two probabilities:

$$P(A)/P(B)$$

to be able to obtain $P(H|D)$. These probabilities are obtained by integration of two different 2D multivariate KDEs as explained in this answer:

$$P(A) = \iint_{x, y : \hat{f}(x, y) < \hat{f}(r_a, s_a)} \hat{f}(x,y)\,dx\,dy$$
$$P(B) = \iint_{x, y : \hat{g}(x, y) < \hat{g}(r_b, s_b)} \hat{g}(x,y)\,dx\,dy$$

where $\hat{f}(x, y)$ and $\hat{g}(x, y)$ are the KDEs and the integration is done for all points below the thresholds $\hat{f}(r_a, s_a)$ and $\hat{g}(r_b, s_b)$. Both KDEs use a Gaussian kernel. A representative image of a KDE similar to the ones I'm working with can be seen here: Integrating kernel density estimator in 2D.

I calculate the KDEs by means of a python function stats.gaussian_kde, so I assume the following general form for it:

$$KDE(x,y) = \frac{1}{n} \sum_{i=1}^{n} -\frac{1}{2h^2} e^{-\frac{(x-x_i)^2 + (y-y_i)^2}{2h^2}}$$

where n is the length of my array of points and h is the bandwidth used.

The integrals above are calculated applying a Monte Carlo process which is quite computationally expensive. I've read somewhere (forgot where, sorry) that in cases like this it is possible to replace the ratio of probabilities by the ratio of PDFs (KDEs) evaluated at the threshold points to obtain equally valid results. I'm interested in this because computing the KDEs ratio is orders of magnitude faster than calculating the ratio of the integrals with MC.

So the question is reduced to the validity of this expression:

$$\frac{P(A)}{P(B)} = \frac{\hat{f}(r_a, s_a)}{\hat{g}(r_b, s_b)}$$

Under which circumstances, if any, can I say that this relation is true?

[fixed typo (EDIT)]


Add:

Here's basically the same question but made in a more mathematical form.

Best Answer

The KDE is a mixture of Normal distributions. Let's look at a single one of them.

The definitions of $P(A)$ and $P(B)$ show their values are invariant under translations and rescalings in the plane, so it suffices to consider the standard Normal distribution with PDF $f$. The inequality

$$f(x,y) \le f(r,s)$$

is equivalent to

$$x^2 + y^2 \ge r^2 + s^2.$$

Introducing polar coordinates $\rho, \theta$ allows the integral to be rewritten

$$P(r,s) = \frac{1}{2\pi}\int_0^{2\pi}\int_\sqrt{r^2+s^2}^\infty \rho \exp(-\rho^2/2) d\rho d\theta= \exp(-(r^2+s^2)/2) = 2\pi f(r,s).$$

Now consider the mixture. Because it is linear,

$$\eqalign{ P(r,s) &= \frac{1}{n}\sum_i 2\pi f((r-x_i)/h, (s-y_i)/h) \\ &= 2\pi h^2\left(\frac{1}{n}\sum_i \frac{1}{h^2} f((r-x_i)/h, (s-y_i)/h)\right) \\ &=2\pi h^2 KDE(r,s). }$$

Indeed, $f$ and $P$ are proportional. The constant of proportionality is $2\pi h^2$.


That such a proportionality relationship between $P$ and $f$ is special can be appreciated by contemplating a simple counterexample. Let $f_1$ have a uniform distribution on a measurable set $A_1$ of unit area and $f_2$ have a uniform distribution on a measurable set $A_2$ which is disjoint from $A_1$ and has area $\mu\gt 1$. Then the mixture with PDF $f=f_1/2 + f_2/2$ has constant value $1/2$ on $A_1$, $1/(2\mu)$ on $A_2$, and is zero elsewhere. There are three cases to consider:

  1. $(r,s)\in A_1$. Here $f(r,s)=1/2$ attains its maximum, whence $P(r,s)=1$. The ratio $f(r,s)/P(r,s) = 1/2$.

  2. $(r,s)\in A_2$. Here $f(r,s)$ is strictly less than $1/2$ but greater than $0$. Thus the region of integration is the complement of $A_1$ and the resulting integral must equal $1/2$. The ratio $f(r,s)/P(r,s) = (1/(2\mu))/(1/2) = 1/\mu$.

  3. Elsewhere, $f$ is zero and the integral $P$ is zero.

Evidently the ratio (where it is defined) is not constant and varies between $1$ and $1/\mu \ne 1$. Although this distribution is not continuous, it can be made so by adding a Normal$(0,\Sigma)$ distribution to it. By making both eigenvalues of $\Sigma$ small, this will change the distribution very little and produce qualitatively the same results--only now the values of the ratio $f/P$ will include all the numbers in the interval $[1,1/\mu]$.


This result also does not generalize to other dimensions. Essentially the same calculation that started this answer shows that $P$ is an incomplete Gamma function and that clearly is not the same as $f$. That two dimensions are special can be appreciated by noting that the integration in $P$ essentially concerns the distances and when those are Normally distributed, the distance function has a $\chi^2(2)$ distribution--which is the exponential distribution. The exponential function is unique in being proportional to its own derivative--whence the integrand $f$ and integral $P$ must be proportional.

Related Question