[Math] Formula of combined variance of two data sets yields wrong output

I have some distribution from which I sample two datasets x1 and x2. I wanted to calculate their combined mean and variance by using these two formulas:

$$\bar X_c = \frac{n_1 \overline{X_1} + n_1 \overline{X_1}}{n_1 + n_2}$$

$${S_c}^2 = \frac{{{n_1}{S_1}^2 + {n_2}{S_2}^2 + {n_1}{{\left( {{{\overline X }_1} – {{\overline X }_c}} \right)}^2} + {n_2}{{\left( {{{\overline X }_2} – {{\overline X }_c}} \right)}^2}}}{{{n_1} + {n_2}}}$$

where $n$ is the number of samples of the dataset. The subscript $c$ indicates the combined values.

For testing purposes, I wanted to check if the formulas yield the same result as when stacking the two datasets to create $x3 = x1+x2$ and calculating the mean and variance of it. So I created a dummy dataset like this:

I calculated the means and variances just for $x1$ and $x2$, and then for the 2 combination methods. It yielded:

        x1      x2      x3      xC

mean    60.80   42.50   52.66   52.66

var     635.2   659.0   657.75  728.47

As you can see, the formula worked for the means, but fails to reproduce the correct variance (x3).

Can somebody tell me what I am doing wrong? Simple answers would be nice, as I am not a great mathematician.

Thank you!

[Math] Formula of combined variance of two data sets yields wrong output

Best Answer

Related Question

Best Answer

Related Solutions

Statistics – Two Formulas for Standard Error of Difference Between Means

[Math] Maximum likelihood estimator of the difference between two normal means and minimising its variance

Related Question