For a given set of $n$ numbers with standard deviation $s$, it is my understanding that adding numbers that are far less than $s$ from the average will always reduce the variance.

1) Does this rule always apply? No edge cases?

2) Is there a formal proof?

## Best Answer

Without any loss of generality we may (by using the mean as the origin and the standard deviation as the unit) choose units of measurement in which the $n$ numbers are $x_1, x_2, \ldots, x_n$ with $\sum x_i = 0$ and $\sum x_i^2 = n$. The variance and standard deviation are both $1$. Let the new number be $x_0$. The change in variance is found by subtracting $1$ from the variance of all $n+1$ numbers:

$$\eqalign{ &\frac{1}{n+1}\left(x_0^2 + \sum_{i=1}^n x_i^2\right) - \left(\frac{1}{n+1}\left(x_0 + \sum_{i=1}^n x_i\right)\right)^2 - 1\\ &=\frac{x_0^2}{n+1} + \frac{n}{n+1} - \frac{x_0^2}{(n+1)^2} - 1\\ &=\frac{1}{n+1}\left(-1 + \frac{n}{n+1}x_0^2\right). }$$

This will be negative if and only if $n x_0^2 \lt n+1$. Its sign does not depend on the units of measurement. Thus,

the variance will decrease when $x_0$ is within $\sqrt{1+1/n}$ standard deviations of the mean, it will increase when $x_0$ is further than this from the mean, and will stay the same otherwise.Note that the variance of $n$ numbers is the mean squared deviation from the mean. It is not multiplied by $n/(n-1)$ for a bias correction, because nothing is being estimated: these are

descriptivestatistics. However, if you do wish to use $\sqrt{(n-1)/n}$ times the standard deviation in the formulas, a little bit of algebra will show that a similar conclusion holds: the threshold between decreasing and increasing the variance is $\sqrt{1+1/n}$ times the "corrected" SD.