Solved – Simple outlier detection for time series

anomaly detectionpython

I wanted to generate a very simple example of anomaly detection for time series. So I created sample data with one very obvious outlier. Here's a picture of the data:

enter image description here

The problem is, I didn't get any method to detect the outlier reliably so far. I tried local outlier factor, isolation forests, k nearest neighbors and DBSCAN. From what I read, at least one of those methods should be suitable. I also tried tweaking the parameters but that didn't really help.

What mistake do I make here? Are the methods not appropriate?

Below is a code example.

Thanks in advance!

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1)

t=np.linspace(0,10,101).reshape(-1,1)
y_test=0.5+t+t**2+2*np.random.randn(len(t),1)

y_test[10]=y_test[10]*7

plt.figure(1)
plt.plot(t,y_test)
plt.show;

from sklearn.neighbors import LocalOutlierFactor

clf=LocalOutlierFactor(contamination=0.1)
pred=clf.fit_predict(y_test)

plt.figure(3)
plt.plot(t[pred==1],y_test[pred==1],'bx')
plt.plot(t[pred==-1],y_test[pred==-1],'ro')
plt.show

from sklearn.ensemble import IsolationForest

clf=IsolationForest(behaviour='new',contamination='auto')
pred=clf.fit_predict(y_test)

plt.figure(4)
plt.plot(t[pred==1],y_test[pred==1],'bx')
plt.plot(t[pred==-1],y_test[pred==-1],'ro')
plt.show

from pyod.models.knn import KNN

clf = KNN()
clf.fit(y_test)
pred=clf.predict(y_test)

plt.figure(5)
plt.plot(t[pred==0],y_test[pred==0],'bx')
plt.plot(t[pred==1],y_test[pred==1],'ro')
plt.show

from sklearn.cluster import DBSCAN

clf = DBSCAN(min_samples=10,eps=3)
pred=clf.fit_predict(y_test)

plt.figure(5)
plt.plot(t[pred==0],y_test[pred==0],'bx')
plt.plot(t[pred==1],y_test[pred==1],'ro')
plt.show

Best Answer

For this type of outlier a filter should work. For instance, a moving average is a filter, and can be applied here in a trend/noise decomposition framework: $$T_i=\frac 1 n\sum_{k=0}^{n-1}x_{i-k} \\N_i=x_i-T_i$$

When the noise component is "too large" it indicates an outlier.

Here's a Python implementation:

for i in np.arange(len(T)):
    T[i] = np.mean(y_test[np.max([0,i-m]):(i+1)])

plt.plot(t, T)
N = y_test[:,0] - T
plt.figure()
plt.plot(t,N)
plt.show()

np.std(N)

The plot of your series with a trend: enter image description here

The plot of the noise: enter image description here

As you can see the noise component jumps at the outlier to 30 while the standard deviation of a noise is ~4. You can also calculate the moving window dispersion of the noise, and detect an outlier when the noise sticks out of the running volatility.

This filter is very similar to an outlier test known as Grubbs. Here, I'm using the moving window as a sample in Grubbs test.