Normal Distribution – Plotting Histogram of a Sample with Overlay of Population Density

density functionhistogramnormal distributionself-study

To familiarize myself with histograms and probability density functions, I decided to sample various distributions, plot samples' histograms and their corresponding probability distribution functions.

I started with Beta:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta, norm

rng = np.random.default_rng()

# Generate data
a = 2.; b = 6 
s = rng.beta(a,b,10000)

# Plot histogram
fig, ax = plt.subplots()
ax.hist(s, 50, density=True, label=r'Sample counts: $\alpha$=2, $\beta$=6')

# Plot pdf
x = np.linspace(beta.ppf(0.001,a,b), beta.ppf(0.999,a,b), 100)
ax.plot(x, beta.pdf(x, a, b),'-', lw=2, color='red', alpha=0.8, label='Beta proba dist function')

ax.set_xlabel('x', fontsize=12, fontweight='bold')
ax.set_ylabel('Beta pdf', fontsize=12, fontweight='bold')
plt.legend(loc='upper right')

plt.show()

enter image description here

Thankfully the result is as-expected.

The trouble started with the normal distribution (standard or not):

mu = -2.
sigma = 2.

# Generate data
data = rng.normal(mu,sigma,(10000,1))

# Plot histogram
fig, ax = plt.subplots() 
count, bins, ignored = ax.hist(data, bins=100, color = (0.,0.,1,0.6))

# Plot pdf
x = bins
y = np.exp(- (bins - mu)**2 / (2 * sigma**2) ) / ( sigma * np.sqrt(2 * np.pi) )
# or
x = np.linspace(norm.ppf(0.001,loc=mu,scale=sigma), norm.ppf(0.999,loc=mu,scale=sigma), 1000)
y = norm.pdf(x,loc=mu,scale=sigma)
ax.plot(x, y, color=(1,0,0,0.8), lw=2, label='normal proba dist. function')

plt.axvline(data.mean(), 
            color='r', linestyle='dotted', linewidth=2, 
            label='Distribution mean' + ' (' + str(round(data.mean(),1)) + ')')
ax.set_facecolor((0.4,0.4,0.1,0.3))
ax.set_ylim(0,400)

plt.legend(fontsize=10, loc='upper left', bbox_to_anchor=(0, 1), ncol=1)

plt.show()

enter image description here

Obviously I expected to get a red bell-shaped curved located at x=-2 but, no dice, there is what is probably a trivial vertical scaling problem with the generated normal curve's pdf. I am missing something basic and don't know what it is.

Any pointer appreciated.

Best Answer

By default, ax.hist will plot the histogram in terms of the bin counts / frequency not density.

If you want to plot the density (so the figure will be on the same scale as the probability density function you're plotting), just pass the density=True keyword argument to ax.hist, i.e. do:

count, bins, ignored = ax.hist(data, bins=100, color = (0.,0.,1,0.6), density=True)