Solved – How to avoid overfitting with genetic algorithm

genetic algorithmsoverfitting

I am facing the following problem. I have a system able to produce a ranking of some operations according to their anomaly score. To improve the performance I implemented a genetic algorithm to perform a features selection, such that the most anomalous operations appears in the first positions. What I am doing is not exactly feature selection, because I am not using binary variables, rather float variables between 0-1, which sum is equal to 1.

Currently, I have a population of 200 individuals for 50 generations. I am using as the evaluation function the system itself and I evaluate the quality of the solution by using the true positive rate, counting how many anomalous operations appears in the first N positions (where N is the number of anomalous operations).

I observed that one feature has a very high value, which is often important, but not always, and this causes very low values for the other features. I suspect that my GA is overfitting. Can you help me to find a good stop criteria?

Best Answer

I don't know if it can be adapted to your situation but I really like this paper on over-fitting / early stopping.

http://page.mi.fu-berlin.de/prechelt/Biblio/stop_tricks1997.pdf

It describes many methods and when they are appropriate. Its very easy to read.