[Math] stdev and mean from gaussian fit vs. from classical formula

normal distributionstandard deviationstatistics

I have a set of data – measured speed of molecules in water.
I made a histogram and fitted it with function
$$A\exp\frac{(x-B)^2}{C}$$
calculating mean and standard deviation from values B and C

If I use the classical formula for standard deviation the results are different.
It's clear they can't be the same. I guess the problem is, that I don't have enough data.

But now what is the difference between those values? Which one is better to use?

Dashed is gaussian with classically computed stdev and mean, full it the fitted gaussian

Best Answer

Fitting a curve to a histogram is troublesome, because the result will in general depend on the number and width of the bins you chose when creating the histogram.

For an extreme example, assume you measured the values $\{\frac{1}{5},\frac{1}{4},\frac{1}{3},\frac{1}{2}\}$. If your bins are $\left((-1,0],(0,1],(1,2]\right)$, your histogram is $\left(0,4,0\right)$. If, on the other hand, your bins are $\left((-\frac{3}{4},\frac{1}{4}],(\frac{1}{4},\frac{5}{4}],(\frac{5}{4},\frac{9}{4}]\right)$, the hisogram becomes $\left(2,2,0\right)$. The two histograms will lead to different results when you fit the normal distribution's density to them - in particular, the resulting distribution's variance will be higher for the second histogram than it will be for the first.

So unless you have a very good reason to determine the distribution's parameters by fitting the density to the histogram, using some other estimator is usually better. For the normal distribution, using the sample average to estimate the distribution's mean, and the sample variance to estimate the distribution's variance is a maximum-likelyhood estimator, and probably a good choice.