Deep Learning is the branch of Machine Learning based on Deep Neural Networks (DNNs), meaning neural networks with at the very least 3 or 4 layers (including the input and output layers). But for some people (especially non-technical), any neural net qualifies as Deep Learning, regardless of its depth. And others consider a 10-layer neural net as shallow.
Convolutional Neural Networks (CNNs) are one of the most popular neural network architectures. They are extremely successful at image processing, but also for many other tasks (such as speech recognition, natural language processing, and more). The state of the art CNNs are pretty deep (dozens of layers at least), so they are part of Deep Learning. But you can build a shallow CNN for a simple task, in which case it's not (really) Deep Learning.
But CNNs are not alone, there are many other neural network architectures out there, including Recurrent Neural Networks (RNN), Autoencoders, Transformers, Deep Belief Nets (DBN = a stack of Restricted Boltzmann Machines, RBM), and more. They can be shallow or deep. Note: even shallow RNNs can be considered part of Deep Learning since training them requires unrolling them through time, resulting in a deep net.
Disclaimer: I am MSc student of Control theory (with engineering background) who is starting his thesis on Reinforcement Learning. I am just beginning to get a feel for the field. Kinda like I just am taking my first walk around the lake of machine learning. So my information may not be spot on. I am answering because I FEEL that I understand the subtle difference. I also get the feeling from your request of example that you would like an application oriented example, not a mathematical abstraction of it.
Differences
- IRL frames its problem as an MDP and uses the notion of an 'agent' to select 'actions' that maximize the net reward. The key difference is, in IRL supervised learning techniques (ie data fitting) are used to obtain the reward function. Supervised learning uses labeled data in order approximate a mapping.
Example of learning ground distance from images
Supervised learning: Using features in images with labeled ground distances to train a Neural network weights to find ground distances in the general case.
IRL: Using labeled data to derive a reward function, which would be a mapping from features to rewards. Letting an agent explore the space of features and coming up with a policy that selects the best actions, which in this case would be an estimation of the ground distance.
For this specific task I described, it seems trivial since using RL for the classification of image distance when simpler supervised learning suffices is redundant. However, in RL situations where the definition of reward functions are difficult but it can be advantageous to use RL, IRL can prove very useful.
For example, if one were to imagine using RL to teach acrobatic maneuvers to helicopters (Paper by Abbeel Et Al), using IRL to obtain reward functions can be very useful. Once the reward functions for the maneuvers are obtained, this can be used to teach these maneuvers to others helicopters (with different aerodynamic models but similar controls) how to perform these maneuvers. Using supervised learning to come up with a mapping of states to controls wont work, since the different aircrafts would have different aerodynamic models.
Reference:
* Ng, A. Y., & Russell, S. J. (2000, June). Algorithms for inverse reinforcement learning. In Icml (pp. 663-670).
* Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research.
Best Answer
I agree with Neil G's answer, but perhaps this alternative phrasing also helps:
Consider the setting of a simple Gaussian mixture model. Here we can think of the model parameters as the set of Gaussian components of the mixture model (each of their means and variances, and each one's weight in the mixture).
Given a set of model parameters, inference is the problem of identifying which component was likely to have generated a single given example, usually in the form of a "responsibility" for each component. Here, the latent variables are just the single identifier for which component generated the given vector, and we are inferring which component that was likely to have been. (In this case, inference is simple, though in more complex models it becomes quite complicated.)
Learning is the process of, given a set of samples from the model, identifying the model parameters (or a distribution over model parameters) that best fit the data given: choosing the Gaussians' means, variances, and weightings.
The Expectation-Maximization learning algorithm can be thought of as performing inference for the training set, then learning the best parameters given that inference, then repeating. Inference is often used in the learning process in this way, but it is also of independent interest, e.g. to choose which component generated a given data point in a Gaussian mixture model, to decide on the most likely hidden state in a hidden Markov model, to impute missing values in a more general graphical model, ....