I have iid data-points $x_1, \dots, x_n$, generated by an unknown density $f(x)$.
So far I have approximated $f(x)$ with a normal $N(\hat{\mu}, \hat{\sigma}^2 )$, where $\hat{\mu}$ and $\hat{\sigma}^2$ are sample average and variance, but I would like something more flexible.
I have tried with simple Kernel density estimators, but the problem is that in my application I often need to evaluate the estimated density $\hat{f}(x)$ deep in the tails (say I need $\hat{f}(x)$ for $|x| >> 2\sigma$) and it looks like $\hat{f}(x)$ drops to zero very rapidly in the tails.
This image shows what I mean:
Here is the R code:
# Generate normal data
N <- 10000
x <- rnorm(N)
# Calculate Bandwidth
tmp <- density(x, kernel = "gaussian")
# Evaluate the log-density
xSeq <- y <- seq(-5, 5, by = 0.1)
for(ii in 1:length(y))
y[ii] <- log(sum(dnorm(xSeq[ii]-x, 0, sd = tmp$bw))/N)
plot(xSeq, y, type = 'l', main = "Black = KDE, broken = normal (true)",
ylab = "Log-density", xlab = "x")
lines(xSeq, dnorm(xSeq, log = TRUE), lty = 2)
In this case the true density is really Gaussian, but in my application it is unknown. Obviously estimating tail probabilities is very difficult, but what I would like is a KDE that doesn't drop to zero so fast. In the literature I found a lot about fat-tailed kernels, but I'm not sure whether that's what I need.
Thanks!
EDIT: One might ask "why do you need to evaluate the estimated density very far in the tails?". The answer is that I want to use this density to estimate a likelihood $f(x_0|\theta)$, where $x_0$ is an observation and $\theta$ is a unknown parameter. Given a set of parameters $\theta^0$ I simulate $n$ data-points $x_1,…,x_n$, estimate their density and use it to get an estimate of the likelihood $\hat{f}(x_0|\theta^0)$. Give that the real $\theta$ is unknown, when I initialize the algorithm at $\theta^0$ my sample can be very far from $x_0$ and hence I will evaluate the estimated density in the extreme tails.
Best Answer
It is a good idea to use the T-distribution to build a KDE
When you build a KDE, once you go outside the data range, the rate of decay in the tails is determined by the rate of decay in the tails of the kernel distribution. The normal distribution has very thin tails (which decay at an exponentially-quadratic rate) so it is unsurprising that the tails of your KDE decay rapidly outside the data range. If you want to ameliorate this, and allow fatter tails, I would recommend that you use a T-distribution as the kernel for your KDE. This allows you to adjust the degrees-of-freedom parameter to adjust the desired "fatness" of the tails, and it even allows you to have heavy tails that give infinite variance in your KDE.
You can implement a KDE using the T-distribution using the
KDE
function in theutilities
package. This function allows you to specify the degrees-of-freedom parameter to control the fatness of the tails of the KDE. (This function produces an object containing probability functions for the KDE; you can also load those functions directly to the global environment so that you can call them just like the probability functions of another distribution.) Here is an example of fitting a KDE using a T-distribution with two degrees-of-freedom, which means that the KDE has tails that are sufficiently heavy to give infinite variance. If you were to examine the log-density of this KDE (using thedkde
function generated here) you will see that the raite of decay in the tails is much slower than for a KDE that uses the normal kernel.