I have a set of data – measured speed of molecules in water.
I made a histogram and fitted it with function
$$A\exp\frac{(x-B)^2}{C}$$
calculating mean and standard deviation from values B and C
If I use the classical formula for standard deviation the results are different.
It's clear they can't be the same. I guess the problem is, that I don't have enough data.
But now what is the difference between those values? Which one is better to use?
Best Answer
Fitting a curve to a histogram is troublesome, because the result will in general depend on the number and width of the bins you chose when creating the histogram.
For an extreme example, assume you measured the values $\{\frac{1}{5},\frac{1}{4},\frac{1}{3},\frac{1}{2}\}$. If your bins are $\left((-1,0],(0,1],(1,2]\right)$, your histogram is $\left(0,4,0\right)$. If, on the other hand, your bins are $\left((-\frac{3}{4},\frac{1}{4}],(\frac{1}{4},\frac{5}{4}],(\frac{5}{4},\frac{9}{4}]\right)$, the hisogram becomes $\left(2,2,0\right)$. The two histograms will lead to different results when you fit the normal distribution's density to them - in particular, the resulting distribution's variance will be higher for the second histogram than it will be for the first.
So unless you have a very good reason to determine the distribution's parameters by fitting the density to the histogram, using some other estimator is usually better. For the normal distribution, using the sample average to estimate the distribution's mean, and the sample variance to estimate the distribution's variance is a maximum-likelyhood estimator, and probably a good choice.