Solved – Plotting a Gaussian distribution with an histogram. Got problems with the math

data visualizationhistogramnormal distribution

I'm plotting, in python (not related with this question), an histogram of 3200 made up weights and I'm told to compare it to a normal distribution.

The thing is I wanted to make the same plot but this time I didn't want to normalize the data. I figured out that if, to normalize the histogram, I divided every weight in 3200, maybe if I multiplied the Gaussian function by 3200 I'd get the plot I wanted. Turns out I got something like this:

I've been looking for an answer for hours, but since I've been working on this assignment all day I guess I can't really think outside the box. How should I treat the function so the peak matches the histogram? Why the multiplication by 3200 doesn't solve the issue? (that's my real question).

Best Answer

The Gaussian function is a probability density, so you need to multiply by the bin width to get a probability (and then multiply the Pr by the # data points, to get a count of "points/bin", rather than "points/kg").

Your Gaussian on the bottom looks finer-sampled, with multiple points per histogram bin, so this may be the issue?

In other words

$$\frac{80 \text{ points}}{\text{kg}}\times\frac{5\text{ kg}}{\text{bin}} = \frac{400 \text{ points}}{\text{bin}}$$

Related Solutions

Solved – Graphical data overview (summary) function in R

Frank Harrell's Hmisc package has some basic graphics with options for annotation: check out the summary.formula() and related plot wrap functions. I also like the describe() function.

For additional information, have a look at the The Hmisc Library or An Introduction to S-Plus and the Hmisc and Design Libraries.

Here are some pictures taken from the on-line help (bpplt, describe, and plot(summary(...))): alt text

Many other examples can be browsed on-line on the R Graphical Manual, see Hmisc (and don't miss rms).

Solved – Odd problem with a histogram in R with a relative frequency axis

One explanation is that the standard deviation of your data is much less than one, and the histogram is giving something like the probability density.

For example, see how the density on the histogram changes when I divide a uniform random variable with range (0, 1) by 1000:

set.seed(4444)
x <- runif(100)
y <- x / 1000

par(mfrow=c(2,1))
hist(x, prob=TRUE)
hist(y, prob=TRUE)

enter image description here

If you want more intuitive looking density values, you could possibly change the units of the variable.

Best Answer

Related Solutions

Solved – Graphical data overview (summary) function in R

Solved – Odd problem with a histogram in R with a relative frequency axis

Related Question