Solved – Creating a transition matrix for markov chain

markov-processr

I have a dataset with monthly frequency of observations that fall in each category, Cat. I would like to construct a transition matrix from this, i.e., from Cat11 to Cat2, Cat1 to Cat3, to all the combinations of Cat. The goal would be to use the markovchain package in R for predicting the future months frequency based on the original data. My questions are:

  1. Is it possible to construct a transition matrix based on this data?
  2. Is it the correct approach?

I looked at the examples in library(markovchain) / library(etm), but I am confused. For example,

library(etm)
head(sir.cont)
   id from to time      age sex
1   41    0  2    4 75.34153   F
2  395    0  2   24 19.17380   M
3  710    1  0   33 61.56568   M
4  710    0  2   37 61.56568   M
5 3138    0  2    8 57.88038   F
6 3154    0  2    3 39.00639   M

have the 'from', 'to' columns.

and my dataset is

df1 <- structure(list(Cat = 1:10, JUL_2013 = c(19L, 10L, 1L, 18L, 2L, 
    15L, 3L, 5L, 4L, 12L), AUG_2013 = c(1L, 16L, 18L, 17L, 11L, 9L, 
    NA, 2L, 4L, 19L), SEP_2013 = c(8L, 7L, 2L, 1L, 5L, 18L, 19L, 
    15L, NA, 4L), OCT_2013 = c(16L, NA, 3L, 18L, 10L, 17L, 2L, 15L, 
    19L, 5L), NOV_2013 = c(8L, 5L, 12L, 3L, 13L, 9L, 16L, 18L, 14L, 
    2L), DEC_2013 = c(NA, 18L, 5L, 20L, 1L, 11L, 9L, 16L, 2L, 3L), 
    JAN_2014 = c(19L, 16L, 6L, 4L, 20L, 2L, 18L, 7L, 5L, 8L), 
    FEB_2014 = c(2L, 8L, 14L, NA, 17L, 15L, 5L, 3L, 4L, 13L), 
    MAR_2014 = c(16L, 8L, 5L, 2L, 7L, 17L, 14L, 11L, 3L, 1L), 
    APR_2014 = c(15L, 10L, 18L, 11L, NA, 1L, 4L, 7L, 12L, 13L
    ), MAY_2014 = c(10L, 8L, 17L, 5L, 1L, 19L, 11L, 16L, 7L, 
    NA), JUN_2014 = c(10L, 17L, 15L, 18L, 11L, 12L, 1L, 8L, 19L, 
    NA), JUL_2014 = c(9L, 20L, 17L, 1L, 3L, 6L, 18L, 14L, 11L, 
    7L), AUG_2014 = c(16L, 19L, NA, 3L, 8L, 14L, 12L, 9L, 13L, 
    4L), SEP_2014 = c(19L, 5L, 16L, 15L, NA, 10L, 13L, 11L, 9L, 
    18L), OCT_2014 = c(NA, 11L, 7L, 17L, 18L, 14L, 3L, 13L, 8L, 
    1L), NOV_2014 = c(18L, 17L, 10L, 5L, 14L, 6L, 20L, 19L, 11L, 
    9L), DEC_2014 = c(14L, 19L, 2L, 18L, 15L, 7L, 5L, 10L, 16L, 
    20L), JAN_2015 = c(4L, 7L, 19L, 18L, 6L, 13L, 9L, 10L, 14L, 
    2L), FEB_2015 = c(4L, 17L, 7L, 18L, 2L, 3L, 5L, 14L, 11L, 
    6L), MAR_2015 = c(18L, 19L, NA, 12L, 11L, 6L, 20L, 15L, 8L, 
    1L), APR_2015 = c(1L, 11L, 16L, 17L, 9L, 10L, 18L, 20L, 6L, 
    2L), MAY_2015 = c(9L, 13L, 4L, 16L, 20L, 17L, 6L, NA, 2L, 
    5L), JUN_2015 = c(7L, 3L, 10L, 19L, NA, 2L, 20L, 16L, 1L, 
    14L), JUL_2015 = c(5L, 6L, 18L, 1L, 20L, 9L, 2L, 4L, 16L, 
    11L)), .Names = c("Cat", "JUL_2013", "AUG_2013", "SEP_2013", 
    "OCT_2013", "NOV_2013", "DEC_2013", "JAN_2014", "FEB_2014", "MAR_2014", 
    "APR_2014", "MAY_2014", "JUN_2014", "JUL_2014", "AUG_2014", "SEP_2014", 
    "OCT_2014", "NOV_2014", "DEC_2014", "JAN_2015", "FEB_2015", "MAR_2015", 
    "APR_2015", "MAY_2015", "JUN_2015", "JUL_2015"), row.names = c(NA, 
    -10L), class = "data.frame")

Best Answer

For a transition matrix you need to know how many persons went from state A to state B and from state A to state C and from state B to state A etc. Knowing how many were in Stata A, B, or C at each given point in time is not enough, you need to know the movements between states. So, no your data does not contain the necessary information to compute a transition matrix.

Related Question