Solved – Supervised learning, unsupervised learning and reinforcement learning: Workflow basics

machine learningreinforcement learningsupervised learningunsupervised learning

Supervised learning

  • 1) A human builds a classifier based on input and output data
  • 2) That classifier is trained with a training set of data
  • 3) That classifier is tested with a test set of data
  • 4) Deployment if the output is satisfactory

To be used when, "I know how to classify this data, I just need you(the classifier) to sort it."

Point of method: To class labels or to produce real numbers

Unsupervised learning

  • 1) A human builds an algorithm based on input data
  • 2) That algorithm is tested with a test set of data (in which the algorithm creates the classifier)
  • 3) Deployment if the classifier is satisfactory

To be used when, "I have no idea how to classify this data, can you(the algorithm) create a classifier for me?"

Point of method: To class labels or to predict (PDF)

Reinforcement learning

  • 1) A human builds an algorithm based on input data
  • 2) That algorithm presents a state dependent on the input data in which a user rewards or punishes the algorithm via the action the algorithm took, this continues over time
  • 3) That algorithm learns from the reward/punishment and updates itself, this continues
  • 4) It's always in production, it needs to learn real data to be able to present actions from states

To be used when, "I have no idea how to classify this data, can you classify this data and I'll give you a reward if it's correct or I'll punish you if it's not."

Is this the kind of flow of these practices, I hear a lot about what they do, but the practical and exemplary information is appallingly little!

Best Answer

This is a very nice compact introduction to the basic ideas!

Reinforcement Learning

I think your use case description of reinforcement learning is not exactly right. The term classify is not appropriate. An better description would be:

I don't know how to act in this environment, can you find a good behavior and meanwhile I'll give you feedback.

In other words, the goal is rather to control something well, than to classify something well.

Input

  • The environment which is defined by
    • all possible states
    • possible actions in the states
  • The reward function dependent on the state and/or action

Algorithm

  • The agent
    • is in a state
    • takes an action to transfer to another state
    • gets a reward for the action in the state

Output

  • The agent wants to find an optimal policy which maximizes the reward
Related Question