Solved – Overview over Reinforcement Learning Algorithms

q-learningreinforcement learning

I'm currently searching for an Overview over Reinforcement Learning Algorithms and maybe a classification of them. But next to Sarsa and Q-Learning + Deep Q-Learning I can't really find any popular algorithms.

Wikipedia gives me an overview over different general Reinforcement Learning Methods but there is no reference to different algorithms implementing this methods.

But maybe I'm confusing general approaches and algorithms and basically there is no real classification in this field, like in other fields of machine learning. Can somebody maybe give me a short introduction or only a reference where I could start reading into the different approaches, the differences between them and example algorithms that implement this approaches?

Best Answer

There is a good survey paper here.

As a quick summary, in additional to Q-learning methods, there are also a class of policy-based methods, where instead of learning the Q function, you directly learn the best policy $\pi$ to use.

These methods include the popular REINFORCE algorithm, which is a policy gradients algorithm. TRPO and GAE are similar policy gradients algorithms.

There are a lot of other variants on policy gradients and it can be combined with Q-learning in the actor-critic framework. The A3C algorithm -- asynchronous advantage actor-critic -- is one such actor-critic algorithm, and a very strong baseline in reinforcement learning.

You can also search for the best policy $\pi$ by mimicking the outputs from an optimal control algorithm, and this is called guided policy search.

In addition to Q-learning and policy gradients, which are both applied in model free settings (neither algorithm maintains a model of the world), there are also model based methods which do estimate the state of the world. These models are valuable because they can be vastly more sample efficient.

Model based algorithms aren't exclusive with policy gradients or Q-learning. A common approach is to perform state estimation / learn a dynamics model, and then train a policy on top of the estimated state.

So as for a classification, one breakdown would be

  • Q or V function learning
  • Policy based methods
  • Model based

Policy based methods can further be subdivided into

  • Policy gradients
  • Actor Critic
  • Policy search