I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.
Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?
EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.
Best Answer
Here is an example using the definition of MAD implemented in R statistical software: For the sample $X_i, \dots, X_n,$ $$\text{MAD} = 1.4826\,\text{Med}(|X_i - H|).$$ where $H$ is the median of the sample, and the constant multiple is intended to put values on a scale so that MAD and sample standard deviation $S$ are roughly comparable for large normal samples. So according to this definition MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $\mathsf{Norm}(\mu=100, \sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77). Also, the MAD was not increased (20.83691 before and after), but the sample SD has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in somewhat the same way as the one I used above. No figures this time. Changes in R code: I have to use my own code to get this MAD,
set.seed
statement will allow you to get exactly the same sample of size 20 as I used (if you try this on your own in R). Original data isx
, data with one value changed to outlier 200 to get altered datay
.Original data
x
has sample mean $\bar X = 101.81,$ sample SD $S_x = 13.70,$ $\text{MAD}_x = 11.2.$Altered data
y
has sample mean $\bar Y = 105.37,$ sample SD $S_y = 25.37,$ $\text{MAD}_y = 15.1.$So the alteration makes a large difference in the sample SD and relatively little difference in MAD (according to your definition).