Markov Decision Processes – Real-Life Examples and Applications

markov-process

I've been watching a lot of tutorial videos and they are look the same. This one for example: https://www.youtube.com/watch?v=ip4iSMRW5X4

They explain states, actions and probabilities which are fine. The person explains it ok but I just can't seem to get a grip on what it would be used for in real-life. I haven't come across any lists as of yet. The most common one I see is chess.

Can it be used to predict things? If so what types of things? Can it find patterns amoung infinite amounts of data? What can this algorithm do for me.

Bonus: It also feels like MDP's is all about getting from one state to another, is this true?

Best Answer

A Markovian Decision Process indeed has to do with going from one state to another and is mainly used for planning and decision making.

The theory

Just repeating the theory quickly, an MDP is:

$$\text{MDP} = \langle S,A,T,R,\gamma \rangle$$

where $S$ are the states, $A$ the actions, $T$ the transition probabilities (i.e. the probabilities $Pr(s'|s, a)$ to go from one state to another given an action), $R$ the rewards (given a certain state, and possibly action), and $\gamma$ is a discount factor that is used to reduce the importance of the of future rewards.

So in order to use it, you need to have predefined:

States: these can refer to for example grid maps in robotics, or for example door open and door closed.
Actions: a fixed set of actions, such as for example going north, south, east, etc for a robot, or opening and closing a door.
Transition probabilities: the probability of going from one state to another given an action. For example, what is the probability of an open door if the action is open. In a perfect world the later could be 1.0, but if it is a robot, it could have failed in handling the doorknob correctly. Another example in the case of a moving robot would be the action north, which in most cases would bring it in the grid cell north of it, but in some cases could have moved too much and reached the next cell for example.
Rewards: these are used to guide the planning. In the case of the grid example, we might want to go to a certain cell, and the reward will be higher if we get closer. In the case of the door example, an open door might give a high reward.

Once the MDP is defined, a policy can be learned by doing Value Iteration or Policy Iteration which calculates the expected reward for each of the states. The policy then gives per state the best (given the MDP model) action to do.

In summary, an MDP is useful when you want to plan an efficient sequence of actions in which your actions can be not always 100% effective.

Your questions

Can it be used to predict things?

I would call it planning, not predicting like regression for example.

If so what types of things?

See examples.

Can it find patterns among infinite amounts of data?

MDPs are used to do Reinforcement Learning, to find patterns you need Unsupervised Learning. And no, you cannot handle an infinite amount of data. Actually, the complexity of finding a policy grows exponentially with the number of states $|S|$.

What can this algorithm do for me.

See examples.

Examples of Applications of MDPs

White, D.J. (1993) mentions a large list of applications:
- Harvesting: how much members of a population have to be left for breeding.
- Agriculture: how much to plant based on weather and soil state.
- Water resources: keep the correct water level at reservoirs.
- Inspection, maintenance and repair: when to replace/inspect based on age, condition, etc.
- Purchase and production: how much to produce based on demand.
- Queues: reduce waiting time.
- ...
Finance: deciding how much to invest in stock.
Robotics:

And there are quite some more models. An even more interesting model is the Partially Observable Markovian Decision Process in which states are not completely visible, and instead, observations are used to get an idea of the current state, but this is out of the scope of this question.

Additional Information

A stochastic process is Markovian (or has the Markov property) if the conditional probability distribution of future states only depend on the current state, and not on previous ones (i.e. not on a list of previous states).

Best Answer

The theory

Your questions

Examples of Applications of MDPs

Additional Information

Related Solutions

Solved – Real world datasets using Markov Chains

Solved – State space for Markov Decision Processes

Related Question