[Math] Book on Markov Decision Processes with many worked examples

markov chainsreference-request

I am looking for a book (or online article(s)) on Markov decision processes that contains lots of worked examples or problems with solutions. The purpose of the book is to grind my teeth on some problems during long commutes.

The book must…

have many examples using dynamic programming and the Bellman equation in discrete space and discrete time;
touch on policy and value iteration, and perhaps their computational complexity and implementation,

and ideally would…

use techniques from convex optimization, Lagrange multipliers, or combine with other computational techniques, such as sorting algorithms;
cover more modern examples besides the usual queueing / inventory problems, such as reinforcement learning;
contain lots of neat tricks and calculations.

An ideal book would not…

be a Theorem-Proof bible, aiming to identify the weakest conditions for optimality results
consider controlled Markov processes and viscosity solutions

I own Sheldon Ross's Applied probability models with optimization applications, in which there are several worked examples, a fair bit of good problems, but no solutions. I have been looking at Puterman's classic textbook Markov Decision Processes: Discrete Stochastic Dynamic Programming, but it is over 600 pages long and a bit on the "bible" side.

I'm looking for something more like Markov Chains and Mixing Times by Levin, Wilmer and Peres, but for MDPs. They have bite-sized chapters and a fair bit of explicit calculation. I like Norris's Markov Chains, which has some nice introductory exposition on potential theory, as well as the Applications chapter in David Williams's Probability with martingales. I do not mind if this "workbook" I am looking for is at an "advanced undergraduate" level, or directed at engineers or computer scientists.

Best Answer

I was asking myself the same question a while ago and have come to the perfect solution. I'm so glad someone else asks the question now as I am dying to share.

Say hello to one of the best books you could ever buy: https://www.amazon.com/Simulation-Based-Optimization-Parametric-Techniques-Reinforcement/dp/1489974903/ref=sr_1_1?keywords=simulationbased+optimization&qid=1565092429&s=gateway&sr=8-1

If you look at the table of contents then you only see MDP's later. Its not a long chapter which might make one suspcicious. But thats why its so good. It covers all the classical exact methods for solving MDPs. The explanations are brief crisp and clear. They short explanations carry a lot of weight in knowledge so don't rush though them e.g. things like explaining what the Natural Process and Decision Process is in a MDP and how this can speed up your algorithm. Furthermore, analysis on convergence is deferred on MDPs to chapter 13. Why is this good? Well, you get to code the pseudocode and compare results to the worked out examples in Chapter 6. Yip, you heard me, actual worked out example and numerical results. Now that examples and code are out of your system, you can enjoy the analysis in Chapter 13. One is always very excited to solve and code problems and doesn't always want to work through proofs before doing so. The author understands this: Practial fun first, serious analysis afterwards.

MDP's are great but limited due to the curse of dimensionality: the space grows in a pseudo-polynomial manner (not bad but hits you when you work with real life problems). This book then introduces you too reinforcement learning which uses the MDPs as their framework. Reinforcement learning is really actually approximate Dynamic Programming. Same story in this chapter: pseudocode, small examples, numerical results and analysis later. What I appreciate in this book is that it introduces reinforcement learning in the right way by referring to the Robbins Monroe Algorithm. This algorithm is a basically the father of RL. This is unlike the Sutton and Barto book which really doesn't explain RL very well. Its called the Reinforcement Learning Bible but it really is very far from that. Terrible book that one.

Now, to the worked out examples. They are two state MDPs. You probably won't ever get a worked out example larger than that. That's because MDP's that growing dimensionality problem I mentioned earlier. I also reccomend Lazy Programmers Reinforcement Learning course on Udemy. He solves grid world using MDP's Asynchronous Value Iteration and Modified(Optmistic) Policy Iteration. Those are cool examples. He uses Python code. However, if you've read Gosavi's book you will notice that his algorithms have basic termination criteria and can terminate much earlier by using the span seminorm and checking if its less than the Bellman Residual. Plus, his algorithms are only for discounted Value Functions.

Dynamic Programming and Optimal Control Vol 2 by Bertsekas is beautiful. He has some worked out examples. The style is much more mathematical though and focuses on structural results of MDPs. What are structural results? MDP algorithms are usually scalable, meaning you can apply them to most MDP problems without change and they will work. However, if you know some things about your problem such that you ae looking for a threshold policy, then you can modify the algorithm for significant performance gains. His book does that. Its does not contain pseudocode. He assumes one can code the algorithms from the math. I find that Decision Making under Uncertainty by Mykel J. Kochenderfer has based his pseudocode on the Bertsekas book. The two go well together.

Also, don't forget Artificial Intelligence by Russell and Norving. Brilliant Chapter on MDPs and RL. Google AIMA Python to get Jupyter notebooks with Python code corresponding to problems and pseudocode in the book.

Books to not get: Sutton and Barto's Reinforcement Learning Warren Powells Dynamic Programming book (terrible notation used) Dont buy Mykel J. Kochenderfer's alone. Its not a stand-alone book.

Best Answer

Related Solutions

Difference Between All Types of Markov Chains

[Math] First time passage decomposition for continuous time Markov chain

Related Question