[Math] Finding new standard deviation and mean after adding an element

statistics

Say I have a mean and standard deviation for a dataset of 5 elements.

I now add a sixth element. Is there a way to calculate the new mean and standard deviation using the information we had prior (i.e. not just recalculating the whole thing from scratch)?

For the mean, I see that I can just multiply the old one by $5$, add the new element, and divide by $6$.

I'm not sure if there's something I can do with the standard deviation, however.

$$\sigma_{old} = \sqrt{\sum_i (X_i – \mu_{old})^2}$$

$$\sigma_{new} = \sqrt{\sum_i (X_i – \mu_{new})^2 + (X_{new} – \mu_{new})^2}$$

$$\mu_{new} = \frac{\mu_{old}*N + X_{new}}{N+1}$$

$$\sigma^2_{new} = \sigma^2_{old} + \sum_i \left( (X_i – \mu_{new})^2 – (X_i – \mu_{old})^2 \right) + (X_{new} – \mu_{new})^2$$

After putting it in terms of the old stats, this becomes (I think)

$$\sigma^2_{new} = \sigma^2_{old} + \sum_i \left(2 X_i + \frac{(2N+1) \mu_{old} + X_{new}}{N+1} \right) \left(\frac{X_{new} – \mu_{old}}{N+1}\right) + (X_{new} – \frac{\mu_{old}*N + X_{new}}{N+1})^2$$

Is there anything better than this monstrosity?

Best Answer

Let's say you started with n points and have added an $(n+1)^{st}$. To handle the variance write $\mu_{new} = \mu_{old} + \delta$ . We see that we need to compute $$\sum_{i=1}^{n} (x_i - \mu_{new})^2$$ Where the sum is just taken over the old $x_i$'s (the contribution from the $(n+1)^{st}$ sample being easily incorporated. But $$(x_i - \mu_{new})^2 = (x_i - \mu_{old} - \delta)^2$$ So our sum becomes $$\sum_{i=1}^{n} (x_i - \mu_{new})^2 = \sum_{i}^{n} (x_i - \mu_{old})^2 - 2 \delta \sum_{1}^{n} (x_i - \mu_{old}) + n \delta^2 = \sum_{i}^{n} (x_i - \mu_{old})^2 + n \delta^2$$ Where the middle sum vanishes as the old x's sum to the old mean.

Combining all this (and trusting that no algebraic error has been made!) we see that $$Var_{new} = \frac{(x_{n+1}-\mu_{new})^2}{n+1}+ \frac{n}{n+1}Var_{old} + \frac{n}{n+1}\delta^2$$

Not too terrible!

Related Question