Solved – Creating a probability transition matrix

markov-processr

Suppose that you have a bunch of states $A,B,C,D$ at time $t_1$ and a bunch of states $A,B,C,D$ at time $t_2$. How would you create a transition matrix in R to represent the probability of changing from various states across $t_1$ to $t_2$?

In a previous question, a package called markovchain is mentioned. But in the example given, it is not clear whether sequence represents states at one particular time or states at two different times. Would I have to concatenate the states at $t_1$ and $t_2$ in my example to get what is called sequence in the aforementioned question?

Best Answer

Modeling the system as a Markov chain, the maximum likelihood estimate for the $A \rightarrow B$ transition probability is simply the number of times you saw $B$ following $A$, divided by the number of times you saw any state following $A$. The transition matrix is a matrix $P$ where $P_{ij}$ contains the $i \rightarrow j$ transition probability. A simple way to calculate this is to construct the transition count matrix $N$, where $N_{ij}$ contains the number of times you observed an $i \rightarrow j$ transition. Then, calculate $P$ by dividing each row of $N$ by its sum.

Scheduling with MDPs

Finite horizon MDPs are used to do scheduling, in most works they have time in the state. M. Dirks wrote an essay (Survey of Dynamic Medical Resource Allocation Methods) about different patient scheduling methods using MDPs. In all the papers they have to overcome the curse of dimensionality, which refers to the exponential growth of search space with the number of states by simplifying the states. Most methods use the count of type of patients: waiting, in treatment 1, in treatment hall 2, emergency case, etc. The actions are defined as the number of patients to choose for the next time slot. See M. Dirks’ presentation for more details. L. Zhu et al. (2013) do patient scheduling using number of reserved slots per day as states, and the action consists of the number of patients (per type) accepted for the next day.

Schneider, J.G. et al. (1998) gives a solution to factory scheduling using MDPs. The states contain the time, product inventory, and factory configuration. The actions are factory configurations.

Recording Scheduler MDP

Assuming there are three time slices in a day (to simplify, later you can extend it to 48) I came up with the following model:

States

$S_\text{time} = \{t_1, t_2, t_3\}$, the time steps in a day
$S_\text{rec}$ = {Rec1, Rec2}, this is the recorder.
$S_\text{record_time_left} = \{0,1,2,3\}$ (time slices).
$S_\text{schema}$, all possible schemas, a subset of: $\{(t_1: 1), (t_1: 2), (t_1: 3), (t_2: 1), (t_2: 2), (t_3,1) \}$, where each tuple $(t_i, n)$ is a requested recording starting at $t_i$ with a duration of $n$.

The final state would be:

$$S = S_\text{time} \times S_\text{rec} \times S_\text{record_time_left} \times S_\text{schema}$$

This gives me, for this problem, $|S|=3 \cdot 2 \cdot 4 \cdot 64 = 1536$ states.

Note that the channel number where to record from is also important, but for simplicity I leave this out.

Actions

Actions are: $a=( (Rec_1,t_{r1},n_1), (Rec_2,t_{r2},n_2) )$ where $Rec_k$ refers to recorder $k$, recording the program that started at $t_k$ with a duration of $n_k$.

Reward

$R(s)$: the sum of time recorded by all recorded in the current state (with timestamp $t_\text{time}$). If Rec1 is recording nothing, and Rec2 has record_time_left=2 then the reward would be $0+1=1$ since in the current state it will only record 1 time step.

Transitions function

The transitions function $P(s’|s,a)$ should now be defined with the following specifications:

The next state should have a time step higher (i.e. always from $t_i$ to $t_{i+1}$).
The record_time_left should be reduced for both recorders (with 1 time unit).
The schema should also be adapted such that it only contains whole programmed recordings, and no past (i.e. before $t_i$).
And you probably will have to put some probability to have upcoming schedule changes, since the scheduled programming might be changed in the future.

Policy

The found policy should then give you the best action based on the time, state of the recorders (recording or not and for how long), and the requested recording schedule.

Other examples

There are quite some good lectures available on-line with nice examples, here I name just a few:

Example 3.7: Recycling Robot MDP in the book of Richard S. Sutton and Andrew G. Barto.
An inventory management example by Andrew Schaefer.
David Silver uses the model of a student's life as example.
Cyrill Stachniss Wolfram Burgard use a robot navigation problem as example.
There are many more...

MDP Diagram

Finally, there are different ways of drawing an MDP diagram:

As in Wikipedia:
As in the Figure 3.3 in the book of Richard S. Sutton and Andrew G. Barto.

But the main thing is that you indicate the states as nodes, and transactions as edges with their probabilities, actions, and rewards.

Solved – Significance testing for Markov chain transition probabilities

I had the same problem, you can take into account the uncertainty of estimating $P(S_t=B)$ from the data by doing a two sample $\chi^2$ test for the hypothesis test. Thus instead of $P(S_t=B)$, you use it as a sample (number times being in B and the total number of events in you chain)

Best Answer

Related Solutions

Solved – Creating a Markov Decision Process

Scheduling with MDPs

Recording Scheduler MDP

Other examples

MDP Diagram

Solved – Significance testing for Markov chain transition probabilities

Related Question