Solved – Too big (?) histogram values when using normed histo options in SciPy and matplotlib

histogrammatplotlibnormalizationpythonscipy

I have been trying to create a normed histogram using either SciPy or matplotlib (or anything for Python).
When I create my histogram with 'normed' option disabled, it looks like below (this example is for 10 bins, but the same happens for a larger number of bins):
(The first number represents the start of a bin, the second the bin's height)

-2.83785600931e-17   1182
5.6688145554e-15   1137
1.13660076709e-14   1031 
1.70632007864e-14   950
2.27603939019e-14   912
2.84575870174e-14   802
3.41547801329e-14   853
3.98519732484e-14   948
4.55491663639e-14   1315
5.12463594794e-14   870

Which is absolutely fine, and what I was expecting. However, I later need to fit this histogram to another histogram, and for that I prefer to have a normed version of this histogram so that fitting the height of those histograms is easier.

Strangely, when I use the option density=True (for scipy.histogram version) or normed=True (for matplotlib.pyplot.plt version) my histogram bin heights get very large values, like below:

-1.44880082614e-17   2.00318764844e+13
5.71138595513e-15   1.98921598219e+13
1.14372599185e-14   1.8040914044e+13
1.71631338819e-14   1.52465807942e+13
2.28890078453e-14   1.56133370332e+13
2.86148818087e-14   1.4617855813e+13
3.43407557721e-14   1.50020766348e+13
4.00666297355e-14   1.74296536456e+13
4.57925036989e-14   2.3769297206e+13
5.15183776622e-14   1.50020766348e+13

I hardly know anything about statistics, but I expected "normed" to mean "sums up to one". Am I incorrect in my thinking, or is the output normalized histogram wrong after all?

Best Answer

Those look okay to me.

The thing that you have to sum is the area of each histogram bar, base times height.

The bases there are about 5.723e-15

By eye, the height of a typical bar then is about 1.6e13, making a typical bar around 0.09 in area, give or take.

Which suggests that the total area will be close to 1