Solved – the difference between “inverse reinforcement learning” and supervised learning

deep learningmachine learningreinforcement learningsupervised learningterminology

It will be great if you provide an example too.

Best Answer

Disclaimer: I am MSc student of Control theory (with engineering background) who is starting his thesis on Reinforcement Learning. I am just beginning to get a feel for the field. Kinda like I just am taking my first walk around the lake of machine learning. So my information may not be spot on. I am answering because I FEEL that I understand the subtle difference. I also get the feeling from your request of example that you would like an application oriented example, not a mathematical abstraction of it.

Differences - IRL frames its problem as an MDP and uses the notion of an 'agent' to select 'actions' that maximize the net reward. The key difference is, in IRL supervised learning techniques (ie data fitting) are used to obtain the reward function. Supervised learning uses labeled data in order approximate a mapping.

Example of learning ground distance from images
Supervised learning: Using features in images with labeled ground distances to train a Neural network weights to find ground distances in the general case.
IRL: Using labeled data to derive a reward function, which would be a mapping from features to rewards. Letting an agent explore the space of features and coming up with a policy that selects the best actions, which in this case would be an estimation of the ground distance.

For this specific task I described, it seems trivial since using RL for the classification of image distance when simpler supervised learning suffices is redundant. However, in RL situations where the definition of reward functions are difficult but it can be advantageous to use RL, IRL can prove very useful.
For example, if one were to imagine using RL to teach acrobatic maneuvers to helicopters (Paper by Abbeel Et Al), using IRL to obtain reward functions can be very useful. Once the reward functions for the maneuvers are obtained, this can be used to teach these maneuvers to others helicopters (with different aerodynamic models but similar controls) how to perform these maneuvers. Using supervised learning to come up with a mapping of states to controls wont work, since the different aircrafts would have different aerodynamic models.

Reference:
* Ng, A. Y., & Russell, S. J. (2000, June). Algorithms for inverse reinforcement learning. In Icml (pp. 663-670).
* Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research.