Solved – How to compute the standard deviation of data with errors

errormeanstandard deviation

If I had a data set,
$A = \{5,6,8,4,2\}$ then computing the mean and the standard deviation is quite simple. This set has a mean, $\bar{x} = 5$ and a standard deviation, $\sigma = 2$.

But what if I had errors on all the samples in the data set, like so
$$B = \{5 \pm 1, 6 \pm 1, 8 \pm 3, 4 \pm2, 2\pm 4\}$$

How do I compute the mean and standard deviation for this data set while taking the errors of the samples into account?

It can't possibly be the same, can it? Because the generic formulas for computing mean and standard deviation would just ignore the uncertainties/errors on the samples in the data set.

Best Answer

This answer will assume your errors are standard deviations.

If you have a data set $x_1,\ldots,x_n$, then we can define the discrete mean and variance as $$\langle{x}\rangle\equiv\frac{1}{n}\sum_ix_i \,,\,\hat{\sigma}^2\equiv\langle{x^2}\rangle-\langle{x}\rangle^2$$ which means $$\langle{x}\rangle{n}=\sum_ix_i \,,\, \langle{x^2}\rangle{n}=\sum_ix_i^2$$

Now imagine your data is really giving $$X_k\pm{\Delta}X_k=\langle{x}\rangle_k\pm\hat{\sigma}_k$$ where the "subsample size" $n_k$ is unknown.

Then we have $$\langle{x}\rangle_k=X_k \,,\, \langle{x^2}\rangle_k={\Delta}X_k^2-\langle{x}\rangle_k^2$$ And the aggregate statistics can be computed via $$\langle{X}\rangle=\frac{1}{N}\sum_kX_kn_k \,,\, \langle{X^2}\rangle=\frac{1}{N}\sum_k(X_k^2+\Delta{X}_k^2)n_k \,,\, N=\sum_kn_k$$ from which you can compute $$\hat{\sigma}_X^2=\langle{X^2}\rangle-\langle{X}\rangle^2$$

For simplicity you could take $n_k=1$.

Related Question