Solved – Anomaly detection using principal component classifier, cutoffs selection

outlierspca

I am trying to implement anomaly detection using principal component classifier proposed in "A novel anomaly detection scheme based on principal component classifier" by Shyu et al.

It proposes that instead of using only the major principal components, it is better to use major as well as minor principal components. For example, the paper uses major principal components that explain 50% of the total variance and the minor components having eigenvalues less than 0.2.

What I am unclear about is the selection of hard cutoffs such as 50% and 0.2. Is there any science behind it? Can anyone please explain?

Best Answer

The first cutoff, the principal components that explain 50% of the total variance, is indeed suggested based on the authors' experiments on the KDD CUP 99 dataset. Underneath Table 2 they explain that they tested cutoffs between 30% to 70%, and that 50% achieved the highest detection rate at the lowest false alarm rate.

As far as I can tell, they have not mentioned any reasoning behind choosing eigenvalues less than 0.2 but I suspect that they used a similar method. That is, testing various cutoffs between some range, and choosing the cutoff which gives the best results on this dataset.

Slightly unrelated, but very important if you are doing research in Intrusion Detection: be very careful with the DARPA 1998 and KDD CUP 99 datasets. It has been known for a very long time now that these datasets are inherently flawed, and that techniques cannot be accurately evaluated using them [1][2]. The NSL-KDD dataset [2] may be a more reliable evaluation but is still not ideal. Furthermore, there is some interesting debate on the overwhelming use of machine learning and other anomaly detection techniques in intrusion detection research [3]. You might want to read the papers in the reference list for more details.

References:

  1. McHugh, "Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory"
  2. Tavallaee et al., "Toward Credible Evaluation of Anomaly-Based Intrusion-Detection Methods"
  3. Sommer et al, "Outside the Closed World: On Using Machine Learning For Network Intrusion Detection"