Simulation – How to Simulate from Kernel Density Estimate (Empirical PDF)?

bootstrapdistributionskernel-smoothingsamplingsimulation

I have a vector X of N=900 observations that are best modeled by a global bandwidth Kernel density estimator (parametric models, including dynamic mixture models, turned out not to be good fits):

enter image description here

Now, I want to simulate from this KDE. I know this can be achieved by bootstrapping.

In R, it all comes down to this simple line of code (which is almost pseudo-code): x.sim = mean(X) + { sample(X, replace = TRUE) - mean(X) + bw * rnorm(N) } / sqrt{ 1 + bw^2 * varkern/var(X) } where the smoothed bootstrap with variance correction is implemented and varkern is the variance of the Kernel function selected (e.g., 1 for a Gaussian Kernel).

What we get with 500 repetitions is the following:

enter image description here

It works, but I have a hard time understanding how shuffling observations (with some added noise) is the same thing as simulating from a probability distribution? (the distribution being here the KDE), like with standard Monte Carlo. Additionally, is bootstrapping the only way to simulate from a KDE?

EDIT: please see my answer below for more information about the smoothed bootstrap with variance correction.

Best Answer

Here's an algorithm to sample from an arbitrary mixture $f(x) = \frac1N \sum_{i=1}^N f_i(x)$:

Pick a mixture component $i$ uniformly at random.
Sample from $f_i$.

It should be clear that this produces an exact sample.

A Gaussian kernel density estimate is a mixture $\frac1N \sum_{i=1}^N \mathcal{N}(x; x_i, h^2)$. So you can take a sample of size $N$ by picking a bunch of $x_i$s and adding normal noise with zero mean and variance $h^2$ to it.

Your code snippet is selecting a bunch of $x_i$s, but then it's doing something slightly different:

changing $x_i$ to $ \hat\mu + \frac{x_i - \hat\mu}{\sqrt{1 + h^2 / \hat\sigma^2}} $
adding zero-mean normal noise with variance $\frac{h^2}{1 + h^2/\hat\sigma^2} = \frac{1}{\frac{1}{h^2} + \frac{1}{\hat\sigma^2}}$, the harmonic mean of $h^2$ and $\sigma^2$.

We can see that the expected value of a sample according to this procedure is $$ \frac1N \sum_{i=1}^N \frac{x_i}{\sqrt{1 + h^2/\hat\sigma^2}} + \hat\mu - \frac{1}{\sqrt{1 + h^2 /\hat\sigma^2}} \hat\mu = \hat\mu $$ since $\hat\mu = \frac1N \sum_{i=1}^N x_i$.

I don't think the sampling disribution is the same, though.

It is a good idea to use the T-distribution to build a KDE

When you build a KDE, once you go outside the data range, the rate of decay in the tails is determined by the rate of decay in the tails of the kernel distribution. The normal distribution has very thin tails (which decay at an exponentially-quadratic rate) so it is unsurprising that the tails of your KDE decay rapidly outside the data range. If you want to ameliorate this, and allow fatter tails, I would recommend that you use a T-distribution as the kernel for your KDE. This allows you to adjust the degrees-of-freedom parameter to adjust the desired "fatness" of the tails, and it even allows you to have heavy tails that give infinite variance in your KDE.

You can implement a KDE using the T-distribution using the KDE function in the utilities package. This function allows you to specify the degrees-of-freedom parameter to control the fatness of the tails of the KDE. (This function produces an object containing probability functions for the KDE; you can also load those functions directly to the global environment so that you can call them just like the probability functions of another distribution.) Here is an example of fitting a KDE using a T-distribution with two degrees-of-freedom, which means that the KDE has tails that are sufficiently heavy to give infinite variance. If you were to examine the log-density of this KDE (using the dkde function generated here) you will see that the raite of decay in the tails is much slower than for a KDE that uses the normal kernel.

#Load the package
library(utilities)

#Generate some mock data
set.seed(1)
DATA <- rnorm(40)

#Create a KDE using the T-distribution with two degrees-of-freedom (infinite variance)
MY_KDE <- KDE(DATA, df = 2, to.environment = TRUE)
plot(MY_KDE)

#Show the KDE output
MY_KDE

  Kernel Density Estimator (KDE) 
 
  Computed from 40 data points in the input 'DATA'
  Estimated bandwidth = 0.367412  
  Input degrees-of-freedom = 2.000000  
 
  Probability functions for the KDE are the following: 
 
      Density function:                   dkde * 
      Distribution function:              pkde * 
      Quantile function:                  qkde * 
      Random generation function:         rkde * 
 
  * This function is presently loaded in the global environment

Kernel Smoothing – Optimal Methods for Bandwidth Selection in Kernel Density Estimation Using R

1) If you are just looking for a relatively new bandwidth selection method (which is well accepted in academia, at least by the number of citations in google.scholar and etc) you can try KDE via diffusion by Botev (2010). It is available within provenance package in R. link for package PDF here

2) In general solve-the-equation is often a benchmark for bandwidth selection since the article by:

Jones, M. C., Marron, J. S., & Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91(433), 401-407.

The both (1) and 2)) above are for the univariate case. However, the best method for you does not really depend on how new and fancy is the method but rather what kind of the data you have and what are your goals/objectives. For example, MLCV (maximum likelihood cross validation) often provides oversmoothed estimates, however if you are looking for smooth tails estimates of your density you might want to consider such a method. If you are just exploring your data a rule-of-thumb method may be sufficient or perhaps even a histogram. And finally in univariate case selecting a bandwidth (or even a series of bandwidths) is not that problematic than in the multivariate case (as dimensions of your density increase) see for your reference if interested:

Sain, S. R. (2002). Multivariate locally adaptive density estimation. Computational Statistics & Data Analysis, 39(2), 165-186.

Best Answer

Related Solutions

Solved – Kernel density estimator that doesn’t collapse in the tails

It is a good idea to use the T-distribution to build a KDE

Kernel Smoothing – Optimal Methods for Bandwidth Selection in Kernel Density Estimation Using R

Related Question