Solved – Automatic feature selection for anomaly detection

feature selectionoutliers

What is the best way to automatically select features for anomaly detection?

I normally treat Anomaly Detection as an algorithm where the features are selected by human experts: what matters is the output range (as in "abnormal input – abnormal output") so even with many features you can come up with a much smaller subset by combining the features.

However, assuming that in general case a feature list can be huge, perhaps an automated learning is sometimes preferable. As far as I can see, there are some attempts:

  • "Automated feature selection for Anomaly Detection" (pdf) which generalizes Support Vector Data Description
  • "A Fast Host-Based Intrusion Detection System Using Rough Set Theory" (no pdf available?) which, I guess, uses Rough Set Theory
  • "Learning Rules for Anomaly Detection of Hostile Network Traffic" (pdf, video) which uses statistical approach

So now I wonder if anyone can tell – assuming anomaly detection and a really big (hundreds?) feature set:

  1. Do those huge feature sets make sense at all? Shouldn't we just reduce the feature set upto, say, a few dozens and that's it?
  2. If huge feature sets do make sense, which one of the approaches above would give better predictions, and why? Is there anything not listed which is much better?
  3. Why should they give better results comparing to, say, dimensionality reduction or feature construction via clustering/ranking/etc?

Best Answer

One practical approach (in case of supervised learning at least) is to include all possibly relevant features and use a (generalized) linear model (logistic regression, linear svm etc.) with regularization (L1 and/or L2). There are open source tools (e.g. Vowpal Wabbit) that can deal with trillions of example/feature combinations for these types of models so scalability is not an issue (besides, one can always use sub-sampling). The regularization helps to deal with feature selection.

Related Question