[Math] Changes in Standard deviation when data value changes

standard deviationstatistics

Here are some questions about the standard deviations

  1. How does standard deviation changes if we add or remove some data points from the data? Are there any criteria to check it?

  2. If the mean of the two categories of the data is given and one category of the data points are added with a constant, what will be the change in combined standard deviation?

For example,

Assume we have two class of data sets, Class A and Class B. The following are the given information about the two sets of data.

Class A – Mean = 77, Variance = 32

Class B – Mean = 66, Variance = 148

The combined mean for Class A and B = 67.4

The Combined variance of Class A and B = 163.04

  1. If I add one more data point to class A(For example 65), then what will be the change in the variance/standard deviation of Class A? Also, what will be the change in combined variance?

  2. If I add 3 with all the points of Class A, what will be the change in the combined variance?

Also, I want to know how do we conclude the change will be increasing, decreasing or remains the same by without doing any calculations?

Sincere thanks for advance.

Best Answer

  1. If the data being removed are close to the sample mean, then the impact of their absence is smaller, compared with when outliers are removed. For example, if a point that is much higher than the mean than most other data, then removing it tend to reduce the sample mean by a noticeable amount. This is merely qualitatively speaking very roughly as you didn't provide much context.
  2. If I understand your inquiry correctly: adding a constant to all data (within the same sample) doesn't change the standard deviation at all. The shift cancels by definition. Consider new data $x'_i = x_i + \Delta$ for all $i$ , then the new mean $\bar x' = \bar x + \Delta$, and the computation for variance (that gives standard deviation after taking square root) $$\frac1{n-1}\sum(x_i' - \bar x')^2 = \frac1{n-1} \sum \bigl(x_i + \Delta - (\bar x + \Delta) \bigr)^2 = \frac1{n-1}\sum(x_i - \bar x)^2 $$ gives you the same thing.