Mean of means and standard deviation

meansstandard deviationstatistics

I have a set of data with their means $\mu_1,\mu_2,\ldots\mu_n$ and standard deviations $\sigma_1,\sigma_2,\ldots,\sigma_n$. These values refer to repetitions of the same experiment.
How do I calculate the "mean mean" (average?) and mean standard deviation summarizing all different experiments $\mu_{tot},\sigma_{tot}$?

Basically I have $X_1\sim N(\mu_1,\sigma^2_1), X_2\sim N(\mu_2,\sigma^2_2),\ldots,X_n\sim N(\mu_n,\sigma^2_n)$
and $Z=\frac{X_1+X_2+\ldots+X_n}{n}$. The question is: $Z\sim N(?,?)$

Best Answer

If you have a physical data set, you can compute it directly. Both the mean, and the standard deviation.

If you have the sizes of populations, say $m_1,\dots,m_n$, then the common mean is trivial to count: $$\mu=\frac{m_1\mu_1+\dots+m_n\mu_n}{m_1+\dots+m_n}$$ as the numerator is the total of all populations.

About the common variance. This is a mean of squares minus a square of mean. Then you should recreate the sums of squares. For example $\sigma_1^2+\mu_1^2$ is the mean of squares of the 1st population. Then you have also the sum of squares. Next, in this simple manner you have the joint sum of squares. Then their averge is easy to find by division by $m_1+\dots+m_n.$ Finally, subtract $\mu^2$ and we are done. :)

If $m_1=\dots=m_n=m$ (you write about the same experiment), then $$\mu=\frac{\mu_1+\dots+\mu_n}{n}.$$ The sum of squares in $i$-th experiment is $m(\sigma_i^2+\mu_i^2)$. Hence the total variance is $$\sigma^2=\frac{m(\sigma_1^2+\mu_1^2+\dots+\sigma_n^2+\mu_n^2)}{nm}-\mu^2=\frac{(\sigma_1^2+\mu_1^2+\dots+\sigma_n^2+\mu_n^2)}{n}-\mu^2.$$

About the (edited) last fragment of your question, the mean of $Z$ is $$\mu=\frac{\mu_1+\dots+\mu_n}{n},$$ while the standard deviation is $$\sigma=\sqrt{\frac{\sigma_1^2+\dots+\sigma_n^2}{n}}$$ provided that $X_1,\dots,X_n$ are independent.

Best Answer

Related Solutions

[Math] How to recursively approximate a moving average and standard deviation

[Math] Combining geometric means from different datasets

Related Question