If we change only one value of a data set, will the mean absolute deviation behave as the same way as standard deviation

standard deviationstatistics

I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.

Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?

EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.

Best Answer

Here is an example using the definition of MAD implemented in R statistical software: For the sample $X_i, \dots, X_n,$ $$\text{MAD} = 1.4826\,\text{Med}(|X_i - H|).$$ where $H$ is the median of the sample, and the constant multiple is intended to put values on a scale so that MAD and sample standard deviation $S$ are roughly comparable for large normal samples. So according to this definition MAD is based on the Median of the absolute differences from the sample median.

Here is a sample of size $n = 20$ from $\mathsf{Norm}(\mu=100, \sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.

x = rnorm(20, 100, 15)
summary(x);  sd(x);  mad(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  60.01   84.13   98.49   98.67  111.50  138.14 
## 19.50935
## 20.83691

boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")

enter image description here

So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.

x.sort = sort(x);  x.20 = x.sort[20];  x.20
## 138.1427
x.sort[20] = 200;  x.sort[20]
## 200
summary(x.sort);  sd(x.sort);  mad(x.sort)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  60.01   84.13   98.49  101.77  111.50  200.00 
## 28.79103
## 20.83691

boxplot(x.sort, horizontal=T, col="skyblue2", pch=20, 
   main="Boxplot of Modified Sample")

enter image description here

Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77). Also, the MAD was not increased (20.83691 before and after), but the sample SD has increased noticeably (roughly, from 19.5 to 28.8).

One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.


Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in somewhat the same way as the one I used above. No figures this time. Changes in R code: I have to use my own code to get this MAD, set.seed statement will allow you to get exactly the same sample of size 20 as I used (if you try this on your own in R). Original data is x, data with one value changed to outlier 200 to get altered data y.

set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x);  mean(abs(x-mean(a)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  81.74   89.17  101.88  101.81  112.51  128.71 
## 101.8078  # sample mean
## 13.70151  # sample SD
## 11.19836  # sample MAD

x.sort = sort(x);  x.20 = x.sort[20];  x.20
## 128.7068
x.sort[20] = 200;  x.sort[20];  y=x.sort
## 200       # 128.71 changed to 200
summary(y);  a = mean(y);  s = sd(y); s  
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  81.74   89.17  101.88  105.37  112.51  200.00 
## 25.37187  # new sample SE
MAD = mean(abs(y-a));  MAD
## 15.07735  # new sample MAD

Original data x has sample mean $\bar X = 101.81,$ sample SD $S_x = 13.70,$ $\text{MAD}_x = 11.2.$

Altered data y has sample mean $\bar Y = 105.37,$ sample SD $S_y = 25.37,$ $\text{MAD}_y = 15.1.$

So the alteration makes a large difference in the sample SD and relatively little difference in MAD (according to your definition).