Solved – How does adding a new value to a set of data affect the standard deviation

standard deviation

For a given set of $n$ numbers with standard deviation $s$, it is my understanding that adding numbers that are far less than $s$ from the average will always reduce the variance.

1) Does this rule always apply? No edge cases?

2) Is there a formal proof?

Best Answer

Without any loss of generality we may (by using the mean as the origin and the standard deviation as the unit) choose units of measurement in which the $n$ numbers are $x_1, x_2, \ldots, x_n$ with $\sum x_i = 0$ and $\sum x_i^2 = n$. The variance and standard deviation are both $1$. Let the new number be $x_0$. The change in variance is found by subtracting $1$ from the variance of all $n+1$ numbers:

$$\eqalign{ &\frac{1}{n+1}\left(x_0^2 + \sum_{i=1}^n x_i^2\right) - \left(\frac{1}{n+1}\left(x_0 + \sum_{i=1}^n x_i\right)\right)^2 - 1\\ &=\frac{x_0^2}{n+1} + \frac{n}{n+1} - \frac{x_0^2}{(n+1)^2} - 1\\ &=\frac{1}{n+1}\left(-1 + \frac{n}{n+1}x_0^2\right). }$$

This will be negative if and only if $n x_0^2 \lt n+1$. Its sign does not depend on the units of measurement. Thus, the variance will decrease when $x_0$ is within $\sqrt{1+1/n}$ standard deviations of the mean, it will increase when $x_0$ is further than this from the mean, and will stay the same otherwise.


Note that the variance of $n$ numbers is the mean squared deviation from the mean. It is not multiplied by $n/(n-1)$ for a bias correction, because nothing is being estimated: these are descriptive statistics. However, if you do wish to use $\sqrt{(n-1)/n}$ times the standard deviation in the formulas, a little bit of algebra will show that a similar conclusion holds: the threshold between decreasing and increasing the variance is $\sqrt{1+1/n}$ times the "corrected" SD.