Solved – Kernel density estimation on bounded support

beta distributiondensity functionkernel-smoothingrunderflow

I was looking for some way to deal with boundary bias of kde in case of a unit interval. One example is an usage of Chen estimators (or Beta estimators; an example might be seen here: http://stats-www.open.ac.uk/TechnicalReports/mcjdah.pdf -p.4) – instead of typical kernel density estimator $$ \hat{f}(x) =\frac{1}{n} \sum_{i=1}^{n}K(x,X_{i};h) $$ we obtain:
$$
\hat{f_{C1}}(x)=\frac{1}{nB(\frac{x}{h^2}+1,\frac{1-x}{h^2}+1)}\sum_{i=1}^{n}X_{i}^{x / h^2}(1-X_{i})^{(1-x) / h^2},
$$

where B() is beta function

The difficulty which I encountered is underflow problems in calculating beta function in case of large values of parameters. For example in R:

data <- runif(10000)

Chen_kde <- function(x,input,h=1/length(input)^(0.9)){

   p = x / h + 1
   q = (1-x) / h + 1

   output = mean(trans_data^(p-1)*(1-trans_data)^(q-1)/beta(p,q))
   return(output)
}

Chen_kde(0.1,data)

Warning message:
In beta(p, q) : underflow occurred in 'beta'

I found that one way to tackle this problem is to approximate a beta distribution with a normal density with equal mean and std deviation. However, each element of above mentioned sum is only "similar" to the beta distribution since x lies in exponent, not in base. My question is if in this example I can also approximate somehow each element to get rid of underflow problems or there can be some other successful methods to correct boundary bias of kde for unit interval.

Best Answer

Several methods to deal with density estimation on bounded support (including the estimation method proposed by Chen) are implemented in the bde package available from the CRAN repository. You may be interested in using it.