I'm using R to calculate the median absolute deviation for a few distributions, but some of the values I'm calculating do not seem realistic at all. I have the following distribution:
x <- [1] NA NA NA -0.003 -0.009 0.004 -0.001 -0.001 -0.003 0.001 -0.002 0.000 -0.003 0.000 0.006 -0.011 -0.003
[18] 0.002 -0.007 -0.002 0.006 -0.005 0.000 0.008 0.001 0.009 -0.002 0.001 0.001 0.002 0.003 NA NA 0.001
[35] NA 0.005 -0.002 0.003 0.016 0.007 -0.003 -0.017 0.000 -0.013 0.000 0.002 0.002 0.000 NA 0.000 0.000
[52] 0.000 0.000 0.004 -0.001 0.000 -0.002 -0.003 -0.007 -0.001 -0.001 0.000 -0.002 0.001 0.003 0.000 -0.011 -0.002
[69] -0.003 0.004 -0.007 NA -0.009 0.005 -0.001 0.001 -0.001 0.001 -0.001 0.006 0.002 -0.006 0.002 -0.002 0.004
[86] 0.006 0.001 0.000 0.002 -0.002 0.007 0.004 0.003 0.004 0.005 -0.005 0.003 -0.003 0.002 0.004 0.003 -0.002
[103] -0.002 0.001 0.002 0.000 0.000 0.003 -0.001 0.004 0.001 0.001 0.005 -0.001 NA -0.005 0.000 -0.002 -0.004
[120] 0.004 NA 0.007 0.000 0.002 0.003 -0.006 -0.002 0.000 -0.002 -0.001 -0.001 -0.001 -0.006 -0.001 -0.001 -0.008
[137] 0.000 0.003 0.001 0.001 -0.001 0.000 0.011 -0.017 NA NA NA
Then I used the following code to generate my MAD value:
MADx <- mad(x, center = median(x, na.rm = TRUE), constant = (1/(quantile(x, probs=0.75, na.rm = TRUE, names = FALSE, type = 1))), na.rm = TRUE, low = FALSE, high = FALSE)
I get a value of 1 when doing this, which seems unrealistic because the values I have are much less than 1.
I used the quantile function to get the 75th quantile of the distribution.
Best Answer
By scaling MAD by the 75th percentile of the data, you've rescaled your MAD by what in some cases is another measure of spread of the same data. Observe that if the dist'n is centered at 0, and is symmetric, the 75th percentile of the distribution is equal to 1/2 the interquartile range (the 75th percentile - the 25th percentile.) (Actually all that's needed is that the 25th percentile and 75th percentiles are symmetric around 0.) If the distribution is symmetric, the MAD of the distribution will also be equal to 1/2 the interquartile range.
In your case, the discreteness of the data means that you can come up with 1 for the ratio in finite samples without much difficulty...
You should put
constant=1
in if you want the pure, unadulterated MAD.