Solved – Confidence intervals of the quantile of the Kernel density

confidence intervalkernel-smoothingquantiles

I have estimated the Kernel density for a set of values, for example

v<-rgamma(15,shape=400,scale=4.3)
z<-density(v)

Now, I construct a function that is the Kernel density of v, I have followed the next post:

Jarle Tufto (https://stats.stackexchange.com/users/77222/jarle-tufto), Quantile of kernel density estimator, URL (version: 2018-04-17): https://stats.stackexchange.com/q/341028

Now I am trying to construct a confidence interval for the quantile with respect to the KDE, is possible to do that? do you have any example or explanation?

Best Answer

If you need to come up with confidence intervals for some kind of complicated estimator, then the most simple and generic approach is to use bootstrap. Same as with constructing confidence intervals for the kernel densities themselves, just re-sample your data many times, estimate the quantity of interest on the data and then calculate the empirical quantiles from this simulated data to get the interval of interest.

Sidenote: As noticed by me and Nick Cox in comments to your previous question, using kernel density to calculate the quantiles, especially the extreme ones, is generally a bad idea. To convince yourself, look at the plot of (very poor) kernel density estimate with rectangular kernel and accompanying cumulative distribution generated using it. It was estimated given five data points (red dots below). Notice that if you used it to compute the quantiles, then for the small quantiles, the function will return zeros for everything smaller then the lowest observed value minus the bandwidth you used for the kernel. The estimates would completely depend on the chosen bandwidth and the observed data and would tell you almost nothing about the values outside of the range of the observed values.

enter image description here

The above problems actually can be nicely illustrated when you calculate the bootstrap confidence intervals for the estimated distribution. As you can see below, the most uncertain values on the extremes, get the narrowest intervals, where the intervals outside the range of the data have zero width. So the estimate for the distribution would be "certain" that anything below 0.2 have zero probability of occurrence, while being completely wrong about it (the data was actually sampled from the standard uniform distribution).

enter image description here

Related Question