I wanted to generate a very simple example of anomaly detection for time series. So I created sample data with one very obvious outlier. Here's a picture of the data:
The problem is, I didn't get any method to detect the outlier reliably so far. I tried local outlier factor, isolation forests, k nearest neighbors and DBSCAN. From what I read, at least one of those methods should be suitable. I also tried tweaking the parameters but that didn't really help.
What mistake do I make here? Are the methods not appropriate?
Below is a code example.
Thanks in advance!
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
t=np.linspace(0,10,101).reshape(-1,1)
y_test=0.5+t+t**2+2*np.random.randn(len(t),1)
y_test[10]=y_test[10]*7
plt.figure(1)
plt.plot(t,y_test)
plt.show;
from sklearn.neighbors import LocalOutlierFactor
clf=LocalOutlierFactor(contamination=0.1)
pred=clf.fit_predict(y_test)
plt.figure(3)
plt.plot(t[pred==1],y_test[pred==1],'bx')
plt.plot(t[pred==-1],y_test[pred==-1],'ro')
plt.show
from sklearn.ensemble import IsolationForest
clf=IsolationForest(behaviour='new',contamination='auto')
pred=clf.fit_predict(y_test)
plt.figure(4)
plt.plot(t[pred==1],y_test[pred==1],'bx')
plt.plot(t[pred==-1],y_test[pred==-1],'ro')
plt.show
from pyod.models.knn import KNN
clf = KNN()
clf.fit(y_test)
pred=clf.predict(y_test)
plt.figure(5)
plt.plot(t[pred==0],y_test[pred==0],'bx')
plt.plot(t[pred==1],y_test[pred==1],'ro')
plt.show
from sklearn.cluster import DBSCAN
clf = DBSCAN(min_samples=10,eps=3)
pred=clf.fit_predict(y_test)
plt.figure(5)
plt.plot(t[pred==0],y_test[pred==0],'bx')
plt.plot(t[pred==1],y_test[pred==1],'ro')
plt.show
Best Answer
For this type of outlier a filter should work. For instance, a moving average is a filter, and can be applied here in a trend/noise decomposition framework: $$T_i=\frac 1 n\sum_{k=0}^{n-1}x_{i-k} \\N_i=x_i-T_i$$
When the noise component is "too large" it indicates an outlier.
Here's a Python implementation:
The plot of your series with a trend:
The plot of the noise:
As you can see the noise component jumps at the outlier to 30 while the standard deviation of a noise is ~4. You can also calculate the moving window dispersion of the noise, and detect an outlier when the noise sticks out of the running volatility.
This filter is very similar to an outlier test known as Grubbs. Here, I'm using the moving window as a sample in Grubbs test.