Solved – Fit and evaluate a second order transition matrix (Markov Process) in R

markov chainmarkov-processrstochastic-processes

I already built 1 first order discrete state Markov Chain model. It was built with R using the function 'markovchainFit()' in 'markovchain' package.

dat<-data.frame(replicate(20,sample(c("A", "B", "C","D"), size = 100, replace=TRUE)))

MC1 <- markovchainFit(dat) # MC1 is the fitted MC Model 

Now I am going to build a second order Markov Chain model.

I am not sure how to find the transition matrix, and if the second order Markov Chain model is better than the first order MC model? How to evaluate the them?

Should I choose the first order model or second order model? Is there any standard or test for doing this?

Thank you!

Best Answer

This function should produce a Markov chain transition matrix to any lag order that you wish.

dat<-data.frame(replicate(20,sample(c("A", "B", "C","D"), size = 100, replace=TRUE)))

Markovmatrix <- function(X,l=1){
  tt <- table(X[,-c((ncol(X)-l+1):ncol(X))] , c(X[,-c(1:l)]))
  tt <- tt / rowSums(tt)
  return(tt)
}


Markovmatrix(as.matrix(dat),1)
Markovmatrix(as.matrix(dat),2)

where l is the lag.

e.g. 2nd order matrix, the output is:

         A         B         C         D
  A 0.2422803 0.2185273 0.2446556 0.2945368
  B 0.2426304 0.2108844 0.2766440 0.2698413
  C 0.2146119 0.2716895 0.2123288 0.3013699
  D 0.2480000 0.2560000 0.2320000 0.2640000

As for how to test what order model. There are several suggestions. One put forward by Gottman and Roy (1990) in their introductory book to Sequential Analysis is to use information value. There is a chapter on that - most of the chapter is available online.

You can also perform a likelihood-ratio chi-Square test. This is very similar to a chi square test in that you are comparing observed to expected frequencies of transitions. However, the formula is as follows:

$$G^2 = 2\sum_i\sum_j n_{ij} \log\left(\frac{n_{ij}}{e_{ij}}\right).$$

The degrees of freedom are the square of the number of codes minus one. In your case you have 4 codes, so (4-1)^2 = 9. You can then look up the associated p-value.

I hope this helps.