Solved – What language to use for genetic programming

genetic algorithms

As part of an assignment I'll have to write a genetic programming algorithm that does prediction of atmospheric pollutant levels. Since I have no experience, can anyone point me pointers to propositions of programming languages in which evolved programs will be written.

Clarification: I'm not asking what will be the language I'll write the genetic algorithm itself (as I will be able to make the decision myself), I'm asking in what programming language the evolved programs should be created.

My instructor suggested Lisp, but I don't like this idea — first I would have to work on some kind of Abstract Syntax Tree, secondly reliably doing crossovers on tree structure can be hell of a mess.

I'd rather use something that is dedicated for genetic programming like slash/A. SlashA does not require working on ASTs — programs in bytecode are a just an array of ints that can be changed in any fasion necessary since every int array represents some slash/A program.

Additional remarks:

  • I'd like to avoid manipulating ASTs!
  • This problem is hard (maybe not as hard as predicting stock values). This is due to the fact that (most probably) we don't have enough input information (there are some hidden parameters). Creating a model that has better performance that model that returns mean is somewhat of a challenge (mean models have 35% MAPE), most models have MAPE of about 25%, best have 20%.
  • I'd like to have a language that manages datasets with many features with assumption that I'm not sure which are important. (Slash/A has a disadvantage here — in this language input features are read sequentially — so some features will be used with bigger probability).
  • I'd like to be able to program this in Python, so python libs would be great — but I can do bindings for C/C++ (no Java, no Matlab, etc).

I'm conscious this is a survey question, so if it is to early for such question please close it, but I feel it is specific enough.

Best Answer

Your pollutant problem probably doesn't need much of a language at all. It looks like a symbolic regression rather than a control problem, in which case you could just use standard tree GP, with features and a few useful constants as the terminal set and relevant operators in the function set. The GP system will weed out irrelevant features and there are techniques to handle very large datasets. Generally, specify the smallest function set that you estimate could solve the problem, and expand it with care if necessary.

You'll need to choose between tree and linear GP early on. Lisp is tree, Slash/A is linear. Read up on both to understand the pros & cons, but from what you wrote I'd suggest a simple tree GP system. It's not too hard to write your own, but there are existing Python implementations. These ones below are for evolutionary algorithms in Python in general but not all do GP and some are inactive:

  1. PyGressionGP (GP for symbolic regression in Python) -- http://code.google.com/p/pygressiongp/
  2. PyGene -- https://github.com/blaa/PyGene
  3. A Simple Genetic Programming in Python -- http://zhanggw.wordpress.com/2009/11/08/a-simple-genetic-programming-in-python-4/
  4. Pyevolve -- https://github.com/perone/Pyevolve -- also see blog -- http://blog.christianperone.com -- and this post -- http://blog.christianperone.com/?p=549
  5. esec (Evolutionary Computation in Python) -- http://code.google.com/p/esec/
  6. Peach -- http://code.google.com/p/peach/
  7. PyBrain (does a lot, not just NN) -- http://pybrain.org/
  8. dione -- http://dione.sourceforge.net/
  9. PyGEP (Genetic Expression Programming) -- http://code.google.com/p/pygep/
  10. deap (Distributed Evolutionary Algorithms) -- http://code.google.com/p/deap/

Also, see the (free) introductory book on GP by well-known GP authors Poli, Langdon and McPhee:

A Field Guide to Genetic Programming -- http://www.gp-field-guide.org.uk/

Related Question