I am using Modified Z-Score to find out outliers on a time series data on exit rate for a website.
N = 1131. Based on last 3 years daily data (1096 values), i am finding out outliers for the remaining values.
Formula i used for Modified Z score is 0.6745 * (Yi – Ymedian)/MAD.
Yi = Actual Value
Ymedian – median of entire dataset.
MAD = Median(Abs(values – Median(Values)))
As per Iglewicz & Hoaglin article, it suggests Modified Z-Score > 3.5 as a outlier. When i apply that rule, it suggests my data has no outliers…
My question is can we change 3.5 to 2.5 or 2? If Yes, how do we determine what should be the cut off?
Best Answer
Your dataset seems to be smaller. So you may use the following:
You may reduce the threshold as sometimes there are no "extreme" outliers in the data.
More information: MAD & Standard Deviation https://blog.arkieva.com/relationship-between-mad-standard-deviation/ https://blog.arkieva.com/mad-versus-standard-deviation/