Markov Chain – How to Estimate the Infinitesimal Generator of a Markov Chain?

markov chain

I am working on an 8 state (8th state absorbing) multi-state markov model, and what I am having difficulty understanding is, how sensitive are the results to the initial qmatrix and what is the best way to go about creating the initial matrix?

For example, in http://leg.ufpr.br/~silvia/papers/mmm.pdf, the matrix is:

-(q12+q14)   q12           0     q14
 q21        -(q21+q23+q24) q23   q24
.......

But when the Q matrix is created immediately below using the following code

Q <- rbind ( c(0, 0.25, 0, 0.25),
+ c(0.166, 0, 0.166, 0.166),
+ c(0, 0.25, 0, 0.25),
+ c(0, 0, 0, 0) )

there are no negative values. In fact, for Q[1,1], the value is zero, not the negative of the sums of q12+q14. Also, the fourth row, which is the absorbing state, is entered as all zeros. Why wouldn't Q[4,4] be 1?

How would I go about creating that initial matrix, given the frequency of times the subjects are observed moving between states? (For another example, please see the state table provided below.)

On a more general note, if someone could recommend a good primer for understanding and using Markov models (skewed towards a general audience), and interpreting their results, I would be grateful.


Here is a the state table for my data (in R):

 dput(mm2myTrajectoryStateTable)
structure(c(13295L, 946L, 126L, 6L, 1L, 1L, 0L, 3461L, 15035L, 
937L, 100L, 19L, 17L, 2L, 636L, 2014L, 3977L, 146L, 4L, 16L, 
4L, 46L, 153L, 518L, 1141L, 0L, 10L, 19L, 84L, 109L, 16L, 3L, 
257L, 25L, 0L, 18L, 117L, 62L, 13L, 72L, 446L, 11L, 0L, 4L, 12L, 
31L, 1L, 28L, 139L, 12909L, 15089L, 4377L, 680L, 146L, 193L, 
38L), .Dim = 7:8, .Dimnames = structure(list(from = c("1", "2", 
"3", "4", "5", "6", "7"), to = c("1", "2", "3", "4", "5", "6", 
"7", "8")), .Names = c("from", "to")), class = "table")

Best Answer

Synopsis

The construction of the transition matrix $P(t)$ from $Q$ and $t$ is explained (with a simple $2\times 2$ example) at Solving the Kolmogorov forward equation for transition probabilities. Obtaining $Q$ from $P=P(t)$ is effected by reversing the process.

Methods

To summarize, that thread shows that $P(t) = \exp(tQ)$ (the matrix exponential) can be computed by diagonalizing $Q$ and exponentiating the diagonal terms (all multiplied by the time $t$). Thus, $Q$ is a matrix logarithm which can be computed by diagonalizing $P$ and taking the logs of its diagonal terms.

Specifically, upon finding a matrix $V$ for which

$$P(t) = V\ \Lambda\ V^{-1}$$

with $\Lambda$ having the values $\lambda_1, \lambda_2, \ldots, \lambda_n$ on the diagonal (for $n$ states, including all absorbing states) and zeros elsewhere, compute that

$$Q = V\ \frac{1}{t}\log(\Lambda)\ V^{-1}$$

where $\log(\Lambda)$ has the values $\log(\lambda_1), \log(\lambda_2), \ldots, \log(\lambda_n)$ on its diagonal and zeros elsewhere.

There is a technical complication: it can require matrices with complex coefficients to achieve the diagonalization. However, after the calculation is complete, all the results will be real numbers. Dealing with this will, in practice, merely require taking the real part of the answer (and perhaps checking that the imaginary parts are all very small in comparison, just as a double-check).

The matrix of the question has this $8\times 8$ matrix for its infinitesimal generator.

Figure

As expected, the rows sum to zero. I have rounded the values to three decimal places, colored the backgrounds according to the rounded values, and desaturated those colors to reflect the magnitudes of the entries (so that grayish entries are small). The patterns clearly stand out.

Assessing uncertainty

When $P$ is estimated from data, its entries--and therefore the entries of $Q$--experience sampling variation. Perhaps the most straightforward way to assess that uncertainty is through simulation (a parametric bootstrap). For the matrix $P$ given in the question here are simple summaries of the $7\times 8$ deviations between $9999$ bootstrapped values of $Q$ and the computed value of $Q$:

Figure

The full $56$-variate distribution can be estimated from the simulation data if desired.

Code

The following R code performs the calculations and simulations. The essential part is performed by the function q.matrix, which computes the matrix logarithm of its argument (presumed to be a square matrix) after row-normalizing it to sum to unity along the rows. The key lines obtain the eigenvectors and values, compute the logarithms of the eigenvalues, and reassemble the result into $Q$:

s <- eigen(p)
q <- Re(s$vectors %*% diag(clog(s$values)) %*% solve(s$vectors))

(Edit November 2019: clog is a complex logarithm; for positive real $x,$ $\operatorname{clog}(-x) = \log(x) + \pi i$. It is needed in order to handle negative eigenvalues.)

The full code follows. The simulation requires about six seconds. A simulation with just $999$ iterations would produce helpful results and take less than one second.

x <- structure(c(13295L, 946L, 126L, 6L, 1L, 1L, 0L, 3461L, 15035L, 
                 937L, 100L, 19L, 17L, 2L, 636L, 2014L, 3977L, 146L, 4L, 16L, 
                 4L, 46L, 153L, 518L, 1141L, 0L, 10L, 19L, 84L, 109L, 16L, 3L, 
                 257L, 25L, 0L, 18L, 117L, 62L, 13L, 72L, 446L, 11L, 0L, 4L, 12L, 
                 31L, 1L, 28L, 139L, 12909L, 15089L, 4377L, 680L, 146L, 193L, 
                 38L), .Dim = 7:8, 
           .Dimnames = structure(list(from = c("1", "2", "3", "4", "5", "6", "7"), 
                                     to = c("1", "2", "3", "4", "5", "6", "7", "8")), 
                                     .Names = c("from", "to")), class = "table") 
clog <- function(z) log(Mod(z)) + Arg(z) * 1i # Complex logarithm
q.matrix <- function(x, tt=1) {
  p <- x / rowSums(x)
  s <- eigen(p)
  q <- s$vectors %*% diag(clog(s$values)) %*% solve(s$vectors)
  return(Re(q / tt))
}
#
# Simulation
#
counts <- rowSums(x)
probs <- t(x/counts)
set.seed(17)
n.iter <- 9999
system.time(
  sim <- array(replicate(n.iter, {
    y <- t(sapply(1:dim(x)[1], function(i) {
      rmultinom(1, counts[i], x[i, ])
    }))
    q.matrix(rbind(y, c(rep(0,7), 1)))
  }), dim=c(8, 8, n.iter))
  )
#
# Compute the infinitesimal generator.
#
q <- q.matrix(p <- rbind(x+1, c(rep(0,7), 1)))
library(Matrix)
sum((expm(q) - p/rowSums(p))^2) # Check that exp(q) = p
#
# Compute and plot the bootstrap statistics.
#
q.mean <- apply(sim, c(1,2), mean)  # The mean bootstrapped values
q.mu2 <- apply(sim^2, c(1,2), mean)
q.sd <- sqrt(q.mu2 - q.mean^2)      # Standard deviations of the bootstrapped values
par(mfrow=c(1,2))
hist((q.mean - q)[1:7, ], freq=FALSE, main="Mean Deviation", xlab="Value")
hist(q.sd[1:7, ], freq=FALSE, main="SD", xlab="Value")
#
# Plot Q
#
par(mfrow=c(1,1))
j <- matrix(rep(1:8, 8), 8)
i <- as.vector(t(j))
j <- 9 - as.vector(j)
q.scale <- (q-min(q))/diff(range(q))
plot(c(0,9), c(0,9), type="n", 
     main="Infinitesimal Generator",
     xlab="To", ylab="From", asp=1,
     bty="n", xaxt="n", yaxt="n")
points(i, j, col="#e0e0e0")
points(i, j, pch=22, cex=6, col="#d0d0d0", bg=hsv(q.scale, abs(q), 0.9))
text(i, j, paste(round(q, 3)), cex=0.75)
text(0, 1:8, paste(8:1))
text(1:8, 0, paste(1:8))