Machine Learning – Converting Reinforcement Learning Problems to Supervised Learning Problems

machine learningreinforcement learning

I'm not sure if I've understood correctly the whole point of reinforcement learning.
In my point of view, the whole goal of RL is learning a policy that maps states to actions.
Let us suppose that I have a dataset with states associated to the desirable action for that state. And let us suppose that my dataset is big and represents in a good way the state space. With this dataset we could use supervised learning for learning a mapping between states and actions, rigth?

Best Answer

Let us suppose that I have a dataset with states associated to the desirable action for that state. And let us suppose that my dataset is big and represents in a good way the state space. With this dataset we could use supervised learning for learning a mapping between states and actions, rigth?

Yes. If available, this will learn an approximation of the policy function from your dataset.

Reinforcement learning (RL) is for when you do not have such a complete and finished dataset, with the answers of how the agent should act in every circumstance. Instead you typically have the definition of an environment, such as the rules of a game, or the controls and sensor inputs from a robot, and the problem is to figure out immediate behaviours that lead to a desired goal. The best action to take in any given situation that leads to a longer-term goal is often not obvious.

RL provides a mechanism to learn from trial and error

Can I convert a typical reinforcement learning problem to a supervised learning problem?

No, unless you already have the dataset that you suggest.

However, knowledge of supervised learning is applied within RL frameworks. Most "Deep RL" which combines RL with neural networks can be thought of as an outer RL algorithm that generates training data (the outcomes of behaviour chosen to test outcomes whilst improving performance towards reaching an optimal policy), combined with an inner supervised learning mechanism (generalising from that observation data to help improve performance in yet unseen situations).

In some, simpler, problems you could use RL techniques or searches to generate a whole dataset for supervised learning, like separate stages of a pipeline. For example, you could perform a tree search from every state in tic tac toe to determine the optimal actions, save that to a dataset, and learn a policy function from it. It may help you understand the role of RL if you think of an approach like that as one extreme of a continuum where at one end RL and supervised learning parts are entirely separate stages, and at the other end, RL is directly learning online from every observation with little or no supervised learning techniques required. Deep RL fits somewhere in the middle.

I'm not sure if I've understood correctly the whole point of reinforcement learning.

If you have a complete and accurate dataset that describes the optimal solution to a control problem, then using RL may be inefficient. However, in practice, that's a big if. To take a modern example of successful application of RL, where would you get this dataset for the game of Go?

In very many real-world problems with action choices, we do not have access to instructions on how to choose optimally. This is where RL fits into the broader machine learning toolkit, it provides a general mechanism for finding solutions to problems of finding optimal solutions to control problems via trial and error.

There may be alternatives to RL in those cases -

operations research (OR) topics may overlap (when they do, OR will often be better choice)
genetic algorithms
"classic" optimal control based on solving differential equations of system state
planning with simulation and search trees (there are many similarities between planning algorithms and RL, they might be considered variations on the same theme)

However, RL has been demonstrated as a strong contender in many areas where it delivers state-of-the-art results, beating other approaches. A typical example would be learning to play Atari computer games.

Best Answer

Related Solutions

Solved – Is the policy function $\pi$ in Reinforcement learning a random variable

Solved – Can reinforcement learning be “stateless”

Related Question