The KDE is a mixture of Normal distributions. Let's look at a single one of them.
The definitions of $P(A)$ and $P(B)$ show their values are invariant under translations and rescalings in the plane, so it suffices to consider the standard Normal distribution with PDF $f$. The inequality
$$f(x,y) \le f(r,s)$$
is equivalent to
$$x^2 + y^2 \ge r^2 + s^2.$$
Introducing polar coordinates $\rho, \theta$ allows the integral to be rewritten
$$P(r,s) = \frac{1}{2\pi}\int_0^{2\pi}\int_\sqrt{r^2+s^2}^\infty \rho \exp(-\rho^2/2) d\rho d\theta= \exp(-(r^2+s^2)/2) = 2\pi f(r,s).$$
Now consider the mixture. Because it is linear,
$$\eqalign{
P(r,s) &= \frac{1}{n}\sum_i 2\pi f((r-x_i)/h, (s-y_i)/h) \\
&= 2\pi h^2\left(\frac{1}{n}\sum_i \frac{1}{h^2} f((r-x_i)/h, (s-y_i)/h)\right) \\
&=2\pi h^2 KDE(r,s).
}$$
Indeed, $f$ and $P$ are proportional. The constant of proportionality is $2\pi h^2$.
That such a proportionality relationship between $P$ and $f$ is special can be appreciated by contemplating a simple counterexample. Let $f_1$ have a uniform distribution on a measurable set $A_1$ of unit area and $f_2$ have a uniform distribution on a measurable set $A_2$ which is disjoint from $A_1$ and has area $\mu\gt 1$. Then the mixture with PDF $f=f_1/2 + f_2/2$ has constant value $1/2$ on $A_1$, $1/(2\mu)$ on $A_2$, and is zero elsewhere. There are three cases to consider:
$(r,s)\in A_1$. Here $f(r,s)=1/2$ attains its maximum, whence $P(r,s)=1$. The ratio $f(r,s)/P(r,s) = 1/2$.
$(r,s)\in A_2$. Here $f(r,s)$ is strictly less than $1/2$ but greater than $0$. Thus the region of integration is the complement of $A_1$ and the resulting integral must equal $1/2$. The ratio $f(r,s)/P(r,s) = (1/(2\mu))/(1/2) = 1/\mu$.
Elsewhere, $f$ is zero and the integral $P$ is zero.
Evidently the ratio (where it is defined) is not constant and varies between $1$ and $1/\mu \ne 1$. Although this distribution is not continuous, it can be made so by adding a Normal$(0,\Sigma)$ distribution to it. By making both eigenvalues of $\Sigma$ small, this will change the distribution very little and produce qualitatively the same results--only now the values of the ratio $f/P$ will include all the numbers in the interval $[1,1/\mu]$.
This result also does not generalize to other dimensions. Essentially the same calculation that started this answer shows that $P$ is an incomplete Gamma function and that clearly is not the same as $f$. That two dimensions are special can be appreciated by noting that the integration in $P$ essentially concerns the distances and when those are Normally distributed, the distance function has a $\chi^2(2)$ distribution--which is the exponential distribution. The exponential function is unique in being proportional to its own derivative--whence the integrand $f$ and integral $P$ must be proportional.
I think part of your confusion is about which types of variables a chi-squared can compare. Wikipedia says the following about this:
It tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution.
Thus it compares frequency distributions, also known as counts, also known as non-negative numbers. The different frequency distributions are defined by the categorical variable; i.e. for each of the values of a categorical variable there needs to be a frequency distribution that can be compared to the other ones.
There are several ways to get the frequency distribution. It might be from a second categorical variable wherein the co-occurances with the first categorical variable are counted to get a discrete frequency distribution. Another option is to use a (multiple) numerical variable for different values of a categorical variable, it can (e.g.) sum the values of the numerical variable. In fact, if categorical variables are binarised the former is a specific version of the later.
Example
As an example look at these sets of variables:
x = ['mouse', 'cat', 'mouse', 'cat']
z = ['wild', 'domesticated', 'domesticated', 'domesticated']
The categorical variables x
and y
can be compared by counting the co-occurances, and this is what happens with a chi-squared test:
'mouse' 'cat'
'wild' 1 0
'domesticated' 1 2
However, you can also binarise the values of 'x' and get the following variables:
x1 = [1, 0, 1, 0]
x2 = [0, 1, 0, 1]
z = ['wild', 'domesticated', 'domesticated', 'domesticated']
Counting the values is now equal to summing the values that correspond to the value of z
.
x1 x2
'wild' 1 0
'domesticated' 1 2
As you can see a single categorical variable (x
) or multiple numerical variables (x1
and x2
) are equally represented by in the contingency table. Thus chi-squared tests can be applied on a categorical variable (the label in sklearn) combined with another categorical variable or multiple numerical variables (the features in sklearn).
Best Answer
The results are negative because
score_samples()
returns the log density.From the help message: