Kernel Density – Interpretation and Use of Kernel Density Estimation

kde

This may be a naive question, but here goes. If I have a set of empirical data and fit a kernel density to it, and then obtain a new single value which possibly comes from the same process which generated the original data set, can I assign a probability that this new value belongs to the set/process by simply reading the value off the y axis where the new value on the x axis intersects the kernel density line and dividing by the area under the density line?

Best Answer

No, I'm afraid not. The kernel density estimand is the probability density function. The y-value is an estimate of the probability density at that value of x, so the area under the curve between x1 and x2 estimates the probability of the random variable X  falling between x1 and x2, assuming that X was generated by the same process that generated the data which you fed into the kernel density estimate. The kernel density estimate doesn't say anything about the probability a new value was generated by the same process.