Solved – Anomaly detection on time series

anomaly detectionpythonscikit learntime series

I'm a beginner using machine learning (I finished Ng's course), I'm using scikit-learn in python.
I want to find the best way to detect anomalies in our system.

We have ongoing events that occur at a schedule (every few min/hours), and I want to detect when something abnormal happens.
Example data:

ID | epoch-time | duration (Sec) | status | is_manual

0400 | 1488801454  | 500 | completed | 1

0401 | 1488805055  | 500 | completed | 1

0402 |  1488812254  | 40000 | failed | 1

6831 | 1488805050  | 200 | failed | 0

.

... (Millions of examples)

.

0014 |  1488805055 | 1200 | completed | 0

so for example event ID 0400 occurs once every hour. I want to tell when it does not run.

What I plan to do is feed the algorithm all the events from the last 10 minutes.

Main questions:
How to treat the ID column?
What is the best approach I should take?

Best Answer

I found this article to be very helpful in my case:

https://mapr.com/blog/deep-learning-tensorflow/

Using this basic RNN structure, I was able to predict the outcome of the next timestep. By centering all events to the nearest minute, the network was able to recognize the pattern that correlates within the timeline.