Solved – Supervised learning, unsupervised learning and reinforcement learning: Workflow basics

machine learningreinforcement learningsupervised learningunsupervised learning

Supervised learning

1) A human builds a classifier based on input and output data
2) That classifier is trained with a training set of data
3) That classifier is tested with a test set of data
4) Deployment if the output is satisfactory

To be used when, "I know how to classify this data, I just need you(the classifier) to sort it."

Point of method: To class labels or to produce real numbers

Unsupervised learning

1) A human builds an algorithm based on input data
2) That algorithm is tested with a test set of data (in which the algorithm creates the classifier)
3) Deployment if the classifier is satisfactory

To be used when, "I have no idea how to classify this data, can you(the algorithm) create a classifier for me?"

Point of method: To class labels or to predict (PDF)

Reinforcement learning

1) A human builds an algorithm based on input data
2) That algorithm presents a state dependent on the input data in which a user rewards or punishes the algorithm via the action the algorithm took, this continues over time
3) That algorithm learns from the reward/punishment and updates itself, this continues
4) It's always in production, it needs to learn real data to be able to present actions from states

To be used when, "I have no idea how to classify this data, can you classify this data and I'll give you a reward if it's correct or I'll punish you if it's not."

Is this the kind of flow of these practices, I hear a lot about what they do, but the practical and exemplary information is appallingly little!

Best Answer

This is a very nice compact introduction to the basic ideas!

Reinforcement Learning

I think your use case description of reinforcement learning is not exactly right. The term classify is not appropriate. An better description would be:

I don't know how to act in this environment, can you find a good behavior and meanwhile I'll give you feedback.

In other words, the goal is rather to control something well, than to classify something well.

Input

The environment which is defined by
- all possible states
- possible actions in the states
The reward function dependent on the state and/or action

Algorithm

The agent
- is in a state
- takes an action to transfer to another state
- gets a reward for the action in the state

Output

The agent wants to find an optimal policy which maximizes the reward

Related Solutions

Solved – Understanding the difference between Supervised and unsupervised learning

Lets look at a simple example of trying to predict housing prices. Assume we have a dataset that looks like

Cost |  Sq Ft  | N bedroom
 100K    1,800     4
 120K    1,300     3
 220K    2,200     5

In the case of supervised learning we would know the cost (these are our y labels) and we would use our set of features (Sq ft and N bedrooms) to build a model to predict the housing cost. The formula would look like

Cost ~ Sq Ft + N bedrooms

Now in unsupervised learning we would not know the cost of the house but we still would know the features. Therefore, we would train a model and try to group the types of houses together that are similar. For an example of this look at k-means clustering (http://scikit-learn.org/stable/modules/clustering.html#clustering)

This is a great, free, book which covers this very nicely (http://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf)

Each type of learning method (may) have a set of parameters which are called model parameters. The training phase is used to find out the optimal set of parameters which generalizes you data the best. That book also gives very nice information on different learning methods are there parameters.

For example, in the leaning algorithm called SVM there is a term that looks like $\exp(-\gamma|x-x^{2}|)$. In this example the $\gamma$ parameter is what we try to optimize using the training data.

Solved – ny difference between distant supervision, self-training, self-supervised learning, and weak supervision

There are two aspects to all the different terms you have given: 1] Process of obtaining training data 2] Algorithm that trains $f$ or the classifier

The algorithm that trains $f$, regardless of how the training data is obtained is supervised. The difference in distant supervision, self-learning, self-supervised or weak supervision, lie purely then in how the training data is obtained.

Traditionally, in any machine learning paper on supervised learning, one would find that the paper implicitly assumes that the training data is available and for what its worth, it is usually assumed that the labels are precise, and that there is no ambiguity in the labels that are given to the instances in the training data. However, with distant/weak supervision papers, people realized that their training data has imprecise labels and what they want to usually highlight in their work is that they obtain good results despite the obvious drawback of using imprecise labels (and they may have other algorithmic ways to overcome the issue of imprecise labels, by having additional filtering process etc. and usually the papers would like to highlight that these additional processes are important and useful). This gave rise to the terms "weak" or "distant" to indicate that the labels on the training data are imprecise. Note that this does not necessarily impact the learning aspect of the classifier. The classifier that these guys use still implicitly assumes that the labels are precise and the training algorithm is hardly ever changed.

Self-training on the other hand is somewhat special in that sense. As you have already observed, it obtains its labels from its own classifier and has a bit of a feedback loop for correction. Generally, we study supervised classifiers under a slightly large purview of "inductive" algorithms, where the classifier learnt is an inductive inference made from the training data about the entire data. People have studied another form, which we call as transductive inference, where a general inductive inference is not the output of the algorithm, but the algorithm collectively takes both training data and test data as input and produces labels on the test data. However, people figured why not use transductive inference within inductive learning to obtain a classifier with larger training data. This is simply referred to as induction with unlabeled data [1] and self-training comes under that.

Hopefully, I have not further confused you, feel free to comment and ask for more clarifications if necessary.

[1] Might be useful - http://www.is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/pdf2527.pdf