# Solved – How does adding a new value to a set of data affect the standard deviation

standard deviation

For a given set of $n$ numbers with standard deviation $s$, it is my understanding that adding numbers that are far less than $s$ from the average will always reduce the variance.

1) Does this rule always apply? No edge cases?

2) Is there a formal proof?

Without any loss of generality we may (by using the mean as the origin and the standard deviation as the unit) choose units of measurement in which the $$n$$ numbers are $$x_1, x_2, \ldots, x_n$$ with $$\sum x_i = 0$$ and $$\sum x_i^2 = n$$. The variance and standard deviation are both $$1$$. Let the new number be $$x_0$$. The change in variance is found by subtracting $$1$$ from the variance of all $$n+1$$ numbers:
\eqalign{ &\frac{1}{n+1}\left(x_0^2 + \sum_{i=1}^n x_i^2\right) - \left(\frac{1}{n+1}\left(x_0 + \sum_{i=1}^n x_i\right)\right)^2 - 1\\ &=\frac{x_0^2}{n+1} + \frac{n}{n+1} - \frac{x_0^2}{(n+1)^2} - 1\\ &=\frac{1}{n+1}\left(-1 + \frac{n}{n+1}x_0^2\right). }
This will be negative if and only if $$n x_0^2 \lt n+1$$. Its sign does not depend on the units of measurement. Thus, the variance will decrease when $$x_0$$ is within $$\sqrt{1+1/n}$$ standard deviations of the mean, it will increase when $$x_0$$ is further than this from the mean, and will stay the same otherwise.
Note that the variance of $$n$$ numbers is the mean squared deviation from the mean. It is not multiplied by $$n/(n-1)$$ for a bias correction, because nothing is being estimated: these are descriptive statistics. However, if you do wish to use $$\sqrt{(n-1)/n}$$ times the standard deviation in the formulas, a little bit of algebra will show that a similar conclusion holds: the threshold between decreasing and increasing the variance is $$\sqrt{1+1/n}$$ times the "corrected" SD.