When I graph a density plot in R, and all the numbers are slightly greater than 0, I get essentially a vertical line at x = 0
. But when all the numbers are exactly equal to 0, I get some sort of bell curve. Why is that? It seems counterintuitive.
The command I used to plot the curves was
for (i in 0:60) {
cur_data = subset(data, time == i)
p <- ggplot(cur_data, aes(x=error)) +
geom_density() +
theme_bw() +
xlab(paste("Error distribution (minute ", i, ")", sep="")) +
xlim(0, 1)
ggsave(...)
}
At i = 60
, cur_data
should be entirely populated by the values 0.0
.
(Originally posted on Stack Overflow; was told to post here.)
Best Answer
You should explain what the intuition is that you have that the behavior runs counter to - it would make it easier to focus the explanation to address that.
A kernel density estimate is the convolution of the sample probability function ($n$ point masses of size $\frac{1}{n}$) and the kernel function (itself, by default, a normal density).
The result in the default case is a mixture of normal (Gaussian) densities, each with center at the data values, each with standard deviation $h$ (the bandwidth of the kernel), and weight $\frac{1}{n}$.
When all the data are coincident, the resulting mixture density is a sum of $n$ weighted densities, all with the same mean and standard deviation ... which is just the kernel itself, centered at that data value.
The difference in behavior you see might relate to the
trim
argument inggplot2::stat_density
. When the range of values is exactly zero, my guess is that it's settingtrim
toFALSE
(or at least something other thanTRUE
), but when it's even a little larger than0
it's at the default (TRUE
). You'd need to look into the source to double check, but that would be my guess. If that's what's happening, you should be able to modify that behavior.