This question concerns how to implement the following problem in R.
x = rnorm(1000)
hist(x,freq=FALSE)
lines(density(x))
How would you calculate the upper (or lower) tail probability for a given cutoff (e.g. +1) given the density estimate above? NOTE: the following solution isn't good enough. I need the calculation based on the smoothed density curve, i.e. an integral of the curve, not the empirical histogram.
sum(x>1)/length(x)
Also please do not suggest the use any of the standard pnorm
functions because they are only correct if the underlying distribution is correctly specified. Thanks!
Best Answer
I would take the same approach as @Flounderer, but exploit another feature of R's
density()
function; namely thefrom
andto
arguments, which restrict the density estimation to the region enclosed by the two arguments. This results in the same density estimates as running the function withoutfrom
and/orto
, but by restricting the range of the density estimate to the region of interest, we focus all of then
evaluation points on the region of interest.This produces
The red line is to illustrate that the density estimates in
dens
anddens2
are the same for the region of interest.Then you can follow the approach @Flounderer used to evaluate the tail probability:
The advantage of this approach is to expend the
n
observations at whichdensity()
evaluates the KDE all on the region of interest. The largern
the higher the resolution that you have in evaluating the tail probability.Note from
?density
that given the FFT used in the implementation, havingn
as a multiple of 2 is advantageous.