There are at least two senses in which you can use the standard deviation in quality measurement:
- Manufacturing precision: How closely are we able to meet a manufacturing specificiation (e.g., bearing diameter).
- Fraction falling within a given tolerance interval: Continuing the example, if bearings must be within $\pm 1$ mm, how many standard deviations does this represent GIVEN our observed manufacturing variability (i.e., (1) from above).
The more fundamental use of the standard deviation is in (1), where you are characterizing how well-controlled your manufacturing process is. In this case, the larger the standard deviation is, the lower the quality of your manufacturing process. This is regardless of the actual standards you need to meet - if process A has higher standard deviation than process B, than for any tolerance interval $\pm a$ mm, process A will generate more bad products than process B.
Now, the second use of standard deviation, (2) from above, is probably what is causing you confusion. In this case, you are correct that more standard deviations indicate a higher quality process, but these standard devaitions are NOT the same as in (1). To illustrate, imagine that you have a process that produces bearings with mean diameter 0.2 mm with standard deviation 0.01 mm (and normally distributed). Now, your tolerance interval is $\pm .05$ mm. How many standard deviations does $\pm .05$ represent, given that you observed a process standard deviation of 0.01 mm? In this case, you would say that your process is $5\sigma$, in that the predetermined tolerance limits represent 5 standard deviations away from your mean process output (which I assumed to be unbiased), as calculated from the underlying process data.
So, you see, you want the standard devation of your manufacturing process to be low, which will increase the number of standard deviations that can fit in your pre-specified tolerance interval. The two uses of $\sigma$ do not mean the same thing.
First, the standard deviation is not the average distance to the mean, that is always zero. It is however, a value to measure how far the points are from the mean or not. Assuming the values are normally distributed, we know that ~68% of the values are between $\mu-\sigma$ and $\mu+\sigma$, for example.
Suppose we weigh potatoes with average weight 100 g and stadard deviation 5 g. What does hold for the average of the average weight of a group of 4 potatoes?
I hope you see that the average of the average weight is still 100 g. But what is the standard deviation of this average weight? That is where you use the formula
$$\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{5}{\sqrt{4}} =2.5$$
Feel free to ask if you still don't understand.
Proof that the average distance between the actual data and the mean is
$0$:
$$\frac{\sum^n_{i=1} (x_i-\mu)}{n} = \frac{(\sum^n_{i=1} x_i)-\mu n}{n} = \frac{\sum^n_{i=1} x_i}{n}-\mu = \mu - \mu = 0$$
Best Answer
Since the question is how the standard deviation changes when changing a data point, and for a fixed number of data points the standard deviation is a differentiable function of the data points, we can solve this answer by looking at the sign of the derivative.
Since all we are interested in is the direction of change and the standard deviation is positive, we can instead look at the square of the standard deviation, also known as variance.
Let's assume wlog that it is data point $x_1$ that is changed. We have $$v = \frac{1}{n}\sum(x_i-\bar x)^2$$ therefore we get by standard derivation rules $$\frac{\mathrm dv}{\mathrm dx_1} = \frac{1}{n}(x_1-\bar x)\left(1-\frac{\mathrm d\bar x}{\mathrm dx_1}\right)$$ Since $$\bar x = \frac{1}{n}\sum x_i$$ we have $$\frac{\mathrm d\bar x}{\mathrm dx_1} = \frac{1}{n}$$ so we finally get $$\frac{\mathrm dv}{\mathrm dx_1} = \frac{n-1}{n^2}(x_1-\bar x)$$ The prefactor is positive, therefore this term is positive if $x_1>\bar x$ and negative if $x_1<\bar x$. That means, the variance, and therefore the standard deviation, grows if a single data point is moved away from the mean, and shrinks if a single data point is moved towards the mean.
Therefore the third bullet point is correct.