Solved – the difference between “inverse reinforcement learning” and supervised learning

deep learningmachine learningreinforcement learningsupervised learningterminology

It will be great if you provide an example too.

Best Answer

Disclaimer: I am MSc student of Control theory (with engineering background) who is starting his thesis on Reinforcement Learning. I am just beginning to get a feel for the field. Kinda like I just am taking my first walk around the lake of machine learning. So my information may not be spot on. I am answering because I FEEL that I understand the subtle difference. I also get the feeling from your request of example that you would like an application oriented example, not a mathematical abstraction of it.

Differences - IRL frames its problem as an MDP and uses the notion of an 'agent' to select 'actions' that maximize the net reward. The key difference is, in IRL supervised learning techniques (ie data fitting) are used to obtain the reward function. Supervised learning uses labeled data in order approximate a mapping.

Example of learning ground distance from images
Supervised learning: Using features in images with labeled ground distances to train a Neural network weights to find ground distances in the general case.
IRL: Using labeled data to derive a reward function, which would be a mapping from features to rewards. Letting an agent explore the space of features and coming up with a policy that selects the best actions, which in this case would be an estimation of the ground distance.

For this specific task I described, it seems trivial since using RL for the classification of image distance when simpler supervised learning suffices is redundant. However, in RL situations where the definition of reward functions are difficult but it can be advantageous to use RL, IRL can prove very useful.
For example, if one were to imagine using RL to teach acrobatic maneuvers to helicopters (Paper by Abbeel Et Al), using IRL to obtain reward functions can be very useful. Once the reward functions for the maneuvers are obtained, this can be used to teach these maneuvers to others helicopters (with different aerodynamic models but similar controls) how to perform these maneuvers. Using supervised learning to come up with a mapping of states to controls wont work, since the different aircrafts would have different aerodynamic models.

Reference:
* Ng, A. Y., & Russell, S. J. (2000, June). Algorithms for inverse reinforcement learning. In Icml (pp. 663-670).
* Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research.

Related Solutions

Solved – Understanding the difference between Supervised and unsupervised learning

Lets look at a simple example of trying to predict housing prices. Assume we have a dataset that looks like

Cost |  Sq Ft  | N bedroom
 100K    1,800     4
 120K    1,300     3
 220K    2,200     5

In the case of supervised learning we would know the cost (these are our y labels) and we would use our set of features (Sq ft and N bedrooms) to build a model to predict the housing cost. The formula would look like

Cost ~ Sq Ft + N bedrooms

Now in unsupervised learning we would not know the cost of the house but we still would know the features. Therefore, we would train a model and try to group the types of houses together that are similar. For an example of this look at k-means clustering (http://scikit-learn.org/stable/modules/clustering.html#clustering)

This is a great, free, book which covers this very nicely (http://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf)

Each type of learning method (may) have a set of parameters which are called model parameters. The training phase is used to find out the optimal set of parameters which generalizes you data the best. That book also gives very nice information on different learning methods are there parameters.

For example, in the leaning algorithm called SVM there is a term that looks like $\exp(-\gamma|x-x^{2}|)$. In this example the $\gamma$ parameter is what we try to optimize using the training data.

Solved – Supervised learning, unsupervised learning and reinforcement learning: Workflow basics

This is a very nice compact introduction to the basic ideas!

Reinforcement Learning

I think your use case description of reinforcement learning is not exactly right. The term classify is not appropriate. An better description would be:

I don't know how to act in this environment, can you find a good behavior and meanwhile I'll give you feedback.

In other words, the goal is rather to control something well, than to classify something well.

Input

The environment which is defined by
- all possible states
- possible actions in the states
The reward function dependent on the state and/or action

Algorithm

The agent
- is in a state
- takes an action to transfer to another state
- gets a reward for the action in the state

Output

The agent wants to find an optimal policy which maximizes the reward