Data Visualization – Understanding Why a Histogram Lacks a Bell Curve Shape

data visualizationhistogramnormal distribution

I am looking at the following graphs (in R):

n = floor(rnorm(10000, 500, 100))
t = table(n)

barplot(n)

enter image description here

barplot(t)

enter image description here

Does anyone know why the first graph does not look like a bell curve, but the second graph does look like a bell curve?

Thanks

Best Answer

Think about what n actually is. It is a vector of 10,000 draws from the Normal distribution as parameterized, Therefore, the first plot has, as its leftmost value, the first value in the vector n, i.e., the first value generated by rnorm. The next value is the second value in the vector n, i.e., the second value generated by rnorm... they are so densely plotted (there are 10,000 of them) that they are overlaid a lot, so you only see the largest value of the overlaid lines. If we assume there are actually 500 vertical lines plotted, each one would have the maximum value from the appropriate block of 20 (= 10000/500) draws from rnorm, e.g., the leftmost line would have the maximum value of the first 20 draws, the next line would have the maximum value of the next 20 draws, etc.

The second graph, on the other hand, plots the number of times each value on the x-axis was observed (thanks to the floor function, this can happen repeatedly.) Consequently, it is really plotting the counts of the values that lie, e.g., in $[500, 501)$. This will resemble a bell curve once you draw enough samples, which evidently you have done.