Time-Series – Heteroskedastic Time Series Outlier Analysis Using Machine Learning

anomaly detectionheteroscedasticitymachine learningoutlierstime series

Is anyone aware of machine learning models that are able to deal with heteroskedasticity in time series, when trying to detect outliers? There are a lot of anomaly detection tools out there (like k-means, isolation forests, one-class SVMs,…) but they do not really have the ability to deal with changes in volatility. Ideally, I would like to have an algorithm that looks locally at the time series. Otherwise, if there are periods in the data where there is high volatility, all of these observations will be flagged as outliers. However, compared to their neighbours these high volatility point should not all count as outliers. This happens because the methods known to me look at the problem globally. Taking a local perspective, observations in high volatility should not necessarily count as outliers. Are there any neural network representations that are able to cope with that? I heard about convolutional approaches that have a sliding window that they use to take a local perspective.

I could also imagine having to transform the data so that some of the outliers appear much more strongly to the mentioned techniques. However, I'm also not aware of any such transformations.

Edit

Here is what I have been describing. The red ones should be outliers, while the green ones should not be. What I'm looking for is an ML approach that allows for volatility clustering and local outlier detection. Because of the multiplicative nature of the underlying process, large jumps are likely clustered.

enter image description here

Best Answer

You want to take into consideration some local patterns in time to define your anomalies. This can be done with a relatively simple feature engineering procedure, and then using the standard anomaly detection methods on top.

  1. Divide the time-series into sliding time-windows of a suitable length for the "context" you wish to have. In your example, this could be some months.
  2. For each time-window, compute your desired metric for volatility. Should preferably be robust to outliers. Median Absolute Deviation is a candidate.
  3. Scale the original data points by the volatility metric to "normalize" / "correct for" volatility.
  4. Do anomaly detection on the new time series. If the feature engineering is strong, this might be as simple as applying a threshold