Solved – how to scale the density plot for the histogram

data visualizationdensity functionhistogramscales

I have the histogram plot and I'd like to overlap it with density line for the same data. Importantly, I don't want to turn histogram into density values, but want to keep N (numbers) on y axis.
Is there any way to overlap the histogram and density plot without transforming the histogram, but rather to scale up the density curve ?

histogram

density plot for the same data

desired graph, but with count on Y axis instead of density

Thanks a lot!

Eugene

Best Answer

The area under a true density function is 1. So unless the total area of the bars in the histogram is also 1, you cannot make a useful match between a true density function and the histogram.

Using actual density functions. A correct (and perhaps the easiest) course of action is to do what you explicitly say (without giving a reason) that you do not want to do: Put the histogram on a density scale and then superimpose either a density estimator based on data or the density function of the hypothetical distribution from which the data in the histogram where sampled. If you do this, the vertical scale of the histogram is automatically the correct scale for the densities.

Below is a histogram of data from a mixture of normal distributions, simulated in R, along with a kernel density estimator (KDE) of the data (red), and the distribution used to simulate the data (dotted). [With sample size as large as $n=6000$ you can expect a good match between the histogram and the KDE---even if not always as good as shown here.]

enter image description here

The relevant R code is shown below.

set.seed(710)
mix = sample(c(-.6, 0, .6), 6000, rep=T, p=c(.1,.8,.1))
x = rnorm(6000, mix, .15)
lbl = "Histogram of Data with KDE (red) and Population Density"
hist(x, prob=T, br=50, col="skyblue2", main=lbl)
 lines(density(x), col="red")
 curve(.1*dnorm(x,-.6,.15)+.8*dnorm(x,0,.15)+.1*dnorm(x,.6,.15),
   add=T, lty="dotted",lwd=3)

"Scaled Density." If you insist on using a non-density function that imitates the shape of the density function, you can make a frequency histogram with the same bins as the plot above, then use the vertical scale to decide what constant multiple of the KDE or the population density gives the effect you want. [In that case you need to explain that the curve is not the density, but suggests its shape.]

For the figure below I multiplied the proper density function by a guess of 300, which seems to work OK. [The term "scaled density" is not widely used, as far as I know, and may tend to make the procedure seem legitimate.]

enter image description here

hist(x, br=50, main="Frequency Histogram with Scaled Density Function")
 curve(30*dnorm(x,-.6,.15)+240*dnorm(x,0,.15)+30*dnorm(x,.6,.15), 
  add=T, lty="dotted",lwd=3)