Solved – Reinforcement learning with Neural Fitted Q-iteration

neural networksreinforcement learning

I have recently read this article – Neural Fitted Q Iteration – Machine Learning and I have tried implement in Python with PyBrain and NumPy on a simple task.

The task is a point representation in 2D, that can move to 4 directions and tries to navigate itself to the origin with some tolerance distance. The point is initially generated in random position near enough the origin.

I am giving rewards with this scheme – +1 when point is in target area, 0 when it is somewhere else and i've also tried to give a negative reward when the point is too far from the origin – -1.

In every iteration of learning I collect samples from environment and then learn Neural Network with themn using RProp algorithm. The network has 4 inputs – position and action vectors and one output, Q(s,a) value. I also added a hidden layer with 5 neurons.

My problem is that when I try to use this trained network, the point almost never reaches target.

Has someone any idea what I can be doing wrong or missing? I am a beginner with Reinforcement learning so it is possible that I am forgetting something.

Best Answer

There are a few subtleties with the PyBrain library and NFQ. I don't have a lot of experience with NFQ, but it's part of the course I tutor at my university. We use the Pybrain library because it's a good intro to a lot of these things. Generally, there are 2 things that help:

  1. Use exploration. set learner.epsilon=x for some x in [0,1] where 0 means only rely on the network's output, and 1 means act completely randomly. A value of 0.05-0.2 can help learning most problems enormously.

  2. Use more learning episodes and more hidden neurons. NFQ only fits to the number of episodes you tell it to, at a complexity based on the number of hidden units. Running more independent episodes and/or running longer episodes can give more experience for training the network.

These approaches have been used to improve NFQ performance a lot on tasks such as the 2048 game, so I imagine it should be similar for your case. In general though, for grid-world type problems, I find table based RL to be far superior. RBF neural nets might also be good (disclaimer: I haven't tried this).

Another thing to check: make sure you give your agent enough information that it could reasonably figure out which direction to go at each point. It has no memory, so if it can't "see" any landmarks to point it in the right direction, it will only learn random noise.

Related Question