R Probability KDE – Estimating the Density of a Zero-Inflated Parameter in R

kdeprobabilityr

I have a data set with lots of zeros that looks like this:

set.seed(1)
x <- c(rlnorm(100),rep(0,50))
hist(x,probability=TRUE,breaks = 25)

I would like to draw a line for its density, but the density() function uses a moving window that calculates negative values of x.

lines(density(x), col = 'grey')

There is a density(... from, to) arguments, but these seem to only truncate the calculation, not alter the window so that the density at 0 is consistent with the data as can be seen by the following plot :

lines(density(x, from = 0), col = 'black')

(if the interpolation was changed, I would expect that the black line would have higher density at 0 than the grey line)

Are there alternatives to this function that would provide a better calculation of the density at zero?

enter image description here

Best Answer

The density is infinite at zero because it includes a discrete spike. You need to estimate the spike using the proportion of zeros, and then estimate the positive part of the density assuming it is smooth. KDE will cause problems at the left hand end because it will put some weight on negative values. One useful approach is to transform to logs, estimate the density using KDE, and then transform back. See Wand, Marron & Ruppert (JASA 1991) for a reference.

The following R function will do the transformed density:

logdensity <- function (x, bw = "SJ") 
{
    y <- log(x)
    g <- density(y, bw = bw, n = 1001)
    xgrid <- exp(g$x)
    g$y <- c(0, g$y/xgrid)
    g$x <- c(0, xgrid)
    return(g)
}

Then the following will give the plot you want:

set.seed(1)
x <- c(rlnorm(100),rep(0,50))
hist(x,probability=TRUE,breaks = 25)
fit <- logdensity(x[x>0]) # Only take density of positive part
lines(fit$x,fit$y*mean(x>0),col="red") # Scale density by proportion positive
abline(v=0,col="blue") # Add spike at zero.

enter image description here

Related Solutions

Bayesian – How to Use Kernel Density Estimate in Naive Bayes Classifier?

I have read both the first linked earlier question, especially the answer of whuber and the comments on this.

The answer is yes, you can do that, i.e. using the density from a kde of a numeric variable as conditional probability ($P(X=x|C=c)$ in the bayes theorem. $P(C=c|X=x)=P(C=c)*P(X=x|C=c)/P(X=x)$

By assuming that d(height) is equal across all classes, d(height) is normalized out when the theorem is applied, i.e. when $P(X=x|C=c)$ is divided by $P(X=x)$.

This paper could be interesting for you: estimating continuous distributions in bayesian classifiers

Solved – How to set up a zero-inflated poisson in JAGS

Here is a simple solution using the fact that the poisson will give you zeros when the lambda parameter is zero. Note however that JAGS tends to break if lambda is exactly zero, thus the "+ 0.00001".

model {
  for (i in 1:I) {

    y[i] ~ dpois(mu1[i])

    mu1[i] <- mu[i]*x[i] + 0.00001

    x[i] ~ dbern(pro[i])
    logit(pro[i]) <- theta[i]

    mu[i] <- bla + bla +bla + ....
    theta[i] <- bla + bla + bla + ....
  }

Best Answer

Related Solutions

Bayesian – How to Use Kernel Density Estimate in Naive Bayes Classifier?

Solved – How to set up a zero-inflated poisson in JAGS

Related Question