I have been trying to create a normed histogram using either SciPy or matplotlib (or anything for Python).
When I create my histogram with 'normed' option disabled, it looks like below (this example is for 10 bins, but the same happens for a larger number of bins):
(The first number represents the start of a bin, the second the bin's height)
-2.83785600931e-17 1182
5.6688145554e-15 1137
1.13660076709e-14 1031
1.70632007864e-14 950
2.27603939019e-14 912
2.84575870174e-14 802
3.41547801329e-14 853
3.98519732484e-14 948
4.55491663639e-14 1315
5.12463594794e-14 870
Which is absolutely fine, and what I was expecting. However, I later need to fit this histogram to another histogram, and for that I prefer to have a normed version of this histogram so that fitting the height of those histograms is easier.
Strangely, when I use the option density=True
(for scipy.histogram
version) or normed=True
(for matplotlib.pyplot.plt
version) my histogram bin heights get very large values, like below:
-1.44880082614e-17 2.00318764844e+13
5.71138595513e-15 1.98921598219e+13
1.14372599185e-14 1.8040914044e+13
1.71631338819e-14 1.52465807942e+13
2.28890078453e-14 1.56133370332e+13
2.86148818087e-14 1.4617855813e+13
3.43407557721e-14 1.50020766348e+13
4.00666297355e-14 1.74296536456e+13
4.57925036989e-14 2.3769297206e+13
5.15183776622e-14 1.50020766348e+13
I hardly know anything about statistics, but I expected "normed" to mean "sums up to one". Am I incorrect in my thinking, or is the output normalized histogram wrong after all?
Best Answer
Those look okay to me.
The thing that you have to sum is the area of each histogram bar, base times height.
The bases there are about 5.723e-15
By eye, the height of a typical bar then is about 1.6e13, making a typical bar around 0.09 in area, give or take.
Which suggests that the total area will be close to 1