Solved – Kernel density estimator that doesn’t collapse in the tails

density functionkernel-smoothingnonparametricsmoothing

I have iid data-points $x_1, \dots, x_n$, generated by an unknown density $f(x)$.
So far I have approximated $f(x)$ with a normal $N(\hat{\mu}, \hat{\sigma}^2 )$, where $\hat{\mu}$ and $\hat{\sigma}^2$ are sample average and variance, but I would like something more flexible.

I have tried with simple Kernel density estimators, but the problem is that in my application I often need to evaluate the estimated density $\hat{f}(x)$ deep in the tails (say I need $\hat{f}(x)$ for $|x| >> 2\sigma$) and it looks like $\hat{f}(x)$ drops to zero very rapidly in the tails.

This image shows what I mean:

Normal vs KDE for normal data
Here is the R code:

# Generate normal data
N <- 10000
x <- rnorm(N)

# Calculate Bandwidth
tmp <- density(x, kernel = "gaussian")

# Evaluate the log-density
xSeq <- y <- seq(-5, 5, by = 0.1)

for(ii in 1:length(y))
y[ii] <- log(sum(dnorm(xSeq[ii]-x, 0, sd = tmp$bw))/N)

plot(xSeq, y, type = 'l', main = "Black = KDE, broken = normal (true)",
     ylab = "Log-density", xlab = "x")
lines(xSeq, dnorm(xSeq, log = TRUE), lty = 2)

In this case the true density is really Gaussian, but in my application it is unknown. Obviously estimating tail probabilities is very difficult, but what I would like is a KDE that doesn't drop to zero so fast. In the literature I found a lot about fat-tailed kernels, but I'm not sure whether that's what I need.
Thanks!

EDIT: One might ask "why do you need to evaluate the estimated density very far in the tails?". The answer is that I want to use this density to estimate a likelihood $f(x_0|\theta)$, where $x_0$ is an observation and $\theta$ is a unknown parameter. Given a set of parameters $\theta^0$ I simulate $n$ data-points $x_1,…,x_n$, estimate their density and use it to get an estimate of the likelihood $\hat{f}(x_0|\theta^0)$. Give that the real $\theta$ is unknown, when I initialize the algorithm at $\theta^0$ my sample can be very far from $x_0$ and hence I will evaluate the estimated density in the extreme tails.

Best Answer

It is a good idea to use the T-distribution to build a KDE

When you build a KDE, once you go outside the data range, the rate of decay in the tails is determined by the rate of decay in the tails of the kernel distribution. The normal distribution has very thin tails (which decay at an exponentially-quadratic rate) so it is unsurprising that the tails of your KDE decay rapidly outside the data range. If you want to ameliorate this, and allow fatter tails, I would recommend that you use a T-distribution as the kernel for your KDE. This allows you to adjust the degrees-of-freedom parameter to adjust the desired "fatness" of the tails, and it even allows you to have heavy tails that give infinite variance in your KDE.

You can implement a KDE using the T-distribution using the KDE function in the utilities package. This function allows you to specify the degrees-of-freedom parameter to control the fatness of the tails of the KDE. (This function produces an object containing probability functions for the KDE; you can also load those functions directly to the global environment so that you can call them just like the probability functions of another distribution.) Here is an example of fitting a KDE using a T-distribution with two degrees-of-freedom, which means that the KDE has tails that are sufficiently heavy to give infinite variance. If you were to examine the log-density of this KDE (using the dkde function generated here) you will see that the raite of decay in the tails is much slower than for a KDE that uses the normal kernel.

#Load the package
library(utilities)

#Generate some mock data
set.seed(1)
DATA <- rnorm(40)

#Create a KDE using the T-distribution with two degrees-of-freedom (infinite variance)
MY_KDE <- KDE(DATA, df = 2, to.environment = TRUE)
plot(MY_KDE)

#Show the KDE output
MY_KDE

  Kernel Density Estimator (KDE) 
 
  Computed from 40 data points in the input 'DATA'
  Estimated bandwidth = 0.367412  
  Input degrees-of-freedom = 2.000000  
 
  Probability functions for the KDE are the following: 
 
      Density function:                   dkde * 
      Distribution function:              pkde * 
      Quantile function:                  qkde * 
      Random generation function:         rkde * 
 
  * This function is presently loaded in the global environment

enter image description here

Related Question