Solved – Why is KDE output negative

kernel-smoothingscikit learn

As far as I know, PDFs always have positive co-domains, but here is an example of one that outputs negative numbers:

http://scikit-learn.org/stable/modules/density.html

from sklearn.neighbors.kde import KernelDensity
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(X)
kde.score_samples(X)
array([-0.41075698, -0.41075698, -0.41076071, -0.41075698, -0.41075698,
       -0.41076071])

Any idea what's going on? And what's the solution?

Here is me trying to do the same but with my own data:

Best Answer

The results are negative because score_samples() returns the log density.

From the help message:

Returns
-------
density : ndarray, shape (n_samples,)
    The array of log(density) evaluations

Related Solutions

Solved – Ratio of probabilities vs ratio of PDFs

The KDE is a mixture of Normal distributions. Let's look at a single one of them.

The definitions of $P(A)$ and $P(B)$ show their values are invariant under translations and rescalings in the plane, so it suffices to consider the standard Normal distribution with PDF $f$. The inequality

$$f(x,y) \le f(r,s)$$

is equivalent to

$$x^2 + y^2 \ge r^2 + s^2.$$

Introducing polar coordinates $\rho, \theta$ allows the integral to be rewritten

$$P(r,s) = \frac{1}{2\pi}\int_0^{2\pi}\int_\sqrt{r^2+s^2}^\infty \rho \exp(-\rho^2/2) d\rho d\theta= \exp(-(r^2+s^2)/2) = 2\pi f(r,s).$$

Now consider the mixture. Because it is linear,

$$\eqalign{ P(r,s) &= \frac{1}{n}\sum_i 2\pi f((r-x_i)/h, (s-y_i)/h) \\ &= 2\pi h^2\left(\frac{1}{n}\sum_i \frac{1}{h^2} f((r-x_i)/h, (s-y_i)/h)\right) \\ &=2\pi h^2 KDE(r,s). }$$

Indeed, $f$ and $P$ are proportional. The constant of proportionality is $2\pi h^2$.

That such a proportionality relationship between $P$ and $f$ is special can be appreciated by contemplating a simple counterexample. Let $f_1$ have a uniform distribution on a measurable set $A_1$ of unit area and $f_2$ have a uniform distribution on a measurable set $A_2$ which is disjoint from $A_1$ and has area $\mu\gt 1$. Then the mixture with PDF $f=f_1/2 + f_2/2$ has constant value $1/2$ on $A_1$, $1/(2\mu)$ on $A_2$, and is zero elsewhere. There are three cases to consider:

$(r,s)\in A_1$. Here $f(r,s)=1/2$ attains its maximum, whence $P(r,s)=1$. The ratio $f(r,s)/P(r,s) = 1/2$.
$(r,s)\in A_2$. Here $f(r,s)$ is strictly less than $1/2$ but greater than $0$. Thus the region of integration is the complement of $A_1$ and the resulting integral must equal $1/2$. The ratio $f(r,s)/P(r,s) = (1/(2\mu))/(1/2) = 1/\mu$.
Elsewhere, $f$ is zero and the integral $P$ is zero.

Evidently the ratio (where it is defined) is not constant and varies between $1$ and $1/\mu \ne 1$. Although this distribution is not continuous, it can be made so by adding a Normal$(0,\Sigma)$ distribution to it. By making both eigenvalues of $\Sigma$ small, this will change the distribution very little and produce qualitatively the same results--only now the values of the ratio $f/P$ will include all the numbers in the interval $[1,1/\mu]$.

This result also does not generalize to other dimensions. Essentially the same calculation that started this answer shows that $P$ is an incomplete Gamma function and that clearly is not the same as $f$. That two dimensions are special can be appreciated by noting that the integration in $P$ essentially concerns the distances and when those are Normally distributed, the distance function has a $\chi^2(2)$ distribution--which is the exponential distribution. The exponential function is unique in being proportional to its own derivative--whence the integrand $f$ and integral $P$ must be proportional.

Solved – What kind of feature selection can Chi square test be used for

I think part of your confusion is about which types of variables a chi-squared can compare. Wikipedia says the following about this:

It tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution.

Thus it compares frequency distributions, also known as counts, also known as non-negative numbers. The different frequency distributions are defined by the categorical variable; i.e. for each of the values of a categorical variable there needs to be a frequency distribution that can be compared to the other ones.

There are several ways to get the frequency distribution. It might be from a second categorical variable wherein the co-occurances with the first categorical variable are counted to get a discrete frequency distribution. Another option is to use a (multiple) numerical variable for different values of a categorical variable, it can (e.g.) sum the values of the numerical variable. In fact, if categorical variables are binarised the former is a specific version of the later.

Example

As an example look at these sets of variables:

x = ['mouse', 'cat', 'mouse', 'cat']
z = ['wild', 'domesticated', 'domesticated', 'domesticated']

The categorical variables x and y can be compared by counting the co-occurances, and this is what happens with a chi-squared test:

                 'mouse'    'cat'
'wild'              1         0
'domesticated'      1         2

However, you can also binarise the values of 'x' and get the following variables:

x1 = [1, 0, 1, 0]
x2 = [0, 1, 0, 1]
z = ['wild', 'domesticated', 'domesticated', 'domesticated']

Counting the values is now equal to summing the values that correspond to the value of z.

                 x1    x2
'wild'           1     0
'domesticated'   1     2

As you can see a single categorical variable (x) or multiple numerical variables (x1 and x2) are equally represented by in the contingency table. Thus chi-squared tests can be applied on a categorical variable (the label in sklearn) combined with another categorical variable or multiple numerical variables (the features in sklearn).

Best Answer

Related Solutions

Solved – Ratio of probabilities vs ratio of PDFs

Solved – What kind of feature selection can Chi square test be used for

Related Question