Solved – From Markov Decision Process (MDP) to Semi-MDP: What is it in a nutshell

Markov Decision Process (MDP) is a mathematical formulation of decision making. An agent is the decision maker. In the reinforcement learning framework, he is the learner or the decision maker. We need to give this agent information so that it is able to learn to decide. As such, an MDP is a tuple: $\left < S, A, P, \gamma, R \right>$. (State, Action, Probability Transitions, discount factor, Reward).

It is basically all you need to know about the problem to be able to make a good decision to optimally perform a task.

I am reading the paper: "Between MDPs and semi-MDPs: A framework for Temporal Abstraction in Reinforcement Learning" (by Richard S.Sutton, Doina Precup and Satinder Singh).

I do not understand so far how we got from MDP to semi-MDP (SMDP). I always thought it was either 'Yes' it is an MDP, or 'No' (one of the components of the tuple is not satisfied) and it is not an MDP.

Why is there a middle ground like Semi-MDP? (Feels like an MDP?)

I ask for your insights and intuition.

Best Answer

It is a semi-MDP because the process is Markovian at the level of decision points/epochs (at the level of the decisions over options) but not at the "flat" level. That is, if you don't observe the current choice of options along the trajectories and only see state-action pairs, that process won't be Markovian. It is "semi" in that sense. You can read Puterman 1994 chapter 8 for more background on SMDPs in the control literature.

Best Answer

Related Solutions

Solved – Uniqueness of the optimal value function for an MDP

Solved – Does episodic reinforcement learning still need a discount factor

Related Question