Solved – How to interpret height of density plot

data visualizationdensity function

How should I interpret the height of density plots:

enter image description here

For example in the above plot, peak is at about 0.07 at x=18. Can I infer that about 7% of values are around 18? Can I be more specific than that? There is also a second peak at x=30 with height of 0.02. Would that mean that about 2% of values are around 30?

Edit: The question on Can a probability distribution value exceeding 1 be OK? discusses the probability value of >1 which is not an issue here at all. It also discusses that in relation to naive Bayes classfier which is also not the point here. I want to have, in simple language, the numerical inferences that we can draw from such density curves. The role of area under curve is discussed but my question is specifically what inference can we draw regarding a particular x and y combination that exist on the curve. For example, how can we relate x=30 and y=0.02 on this graph. What statement can we write regarding relation between 30 and 0.02 here. Since densities are for one unit value, can we say that 2% of values occur between 29.5 and 30.5? If that is the case, how do we interpret if values vary from only 0 to 1, as in following plot:

enter image description here

If 100% of values occur between 0 and 1, why any curve is there outside 0 and 1?

There is a flat part here at x=0.1 to x=0.2 where y equals 0.8. It forms a rectangle. How can we find out what proportion of values occur between x=0.1 and x=0.2

Best Answer

You need to be careful with your wording here. Assuming x is a continuous variable, the probability of any individual value is precisely zero. Talking, as you did, about the probability of a value lying around some point is fine, though you might want to be a bit more precise. Your second statement, in which you provided the interval along with the probability is something I would be looking for.

In essence, an integral of density function with respect to x will tell you about the probability itself (that's why it's called density). Obviously, the interval over which you will integrate may be arbitrarily small, so you can get close to a point to an arbitrary degree. That said, when the density function is varying very slowly over that interval, you can approximate the integral by some numerical technique, such as the trapezoidal rule.

To summarize: the height of the density function is just that, its height. Anything you might want to conclude about probability will have to include integrating of some form or another.

Related Question