Solved – Estimating Standard Errors for Markov Transition Probability with Multiple Observations (in R)

bootstrapmarkov chainmarkov-processr

I was trying to estimate a Markov transition table from paired transition data, which look something like this:

 t t+1 
 A B 
 A C
 B C 
 C D
 B C
 C D
 E A
 E E
 A D
 C B
 B A
 A C
 C D
 C B

It is quite easy to estimate the transition probabilities, just making using table()command in R. However, I'm not exactly sure how to estimate the standard error associated with each probability. My current approach is to use a bootstrap method that proceeds as follows:

  1. Take a random sample of the pairs, $S^{i}$.

  2. Estimate a Markov table based on the sample $S^{i}$, calculating all transition probabilities as usual.

  3. Repeat 1 and 2 for $N$ times. Combine the results from $N$ iterations to calculate the variance (for each cell, calculate the variances based on the $N$ estimated transition probabilities)

I would like to know two things:

A. Does this approach make sense statistically?

B. For some rare states (e.g. state E in the example), bootstrap may miss them in some random samples, hence the constructed Markov table for those iterations would be incomplete. In that case, how should I calculate the standard error for those rare states?

Thanks!

Best Answer

Before you start thinking about the standard errors of your estimates, it would be good to know, or to check, if your Markov chain has a stationary distribution.

There is no way to say anything from the small sample you provided as it seems to be just randomly shuffled pairs of states, rather than the actual history of the process (the pair on the next line doesn't start in the state where the previous line ended). Going forward I will presume that the chain has a stationary distribution.

Regarding the bootstrap for the state pairs, it seems you are trying to obtain a confidence interval for your maximum likelihood estimates of the transition probabilities. The bootstrap can be used for this, but I would consider bootstrapping each row of the transition matrix at a time - basically simulating a multinomial random variable. (see also this paper)

A row-wise approach also guarantees that you will not obtain invalid matrices from your bootstrap.

I think you could also be able to calculate the 95% confidence intervals for your estimates directly (for example using the MultinomialCI package):

> library(MultinomialCI)
> t1 <- c("A","A","B","C","B","C","E","E","A","C","B","A","C","C")
> t2 <- c("B","C","C","D","C","D","A","E","D","B","A","C","D","B")
> 
> transition_probs <- data.frame(t1,t2)
> 
> # illustration for departure state A
> est_trans_prob_A <- table(transition_probs$t2[transition_probs$t1=="A"])/
+                     length(transition_probs$t2[transition_probs$t1=="A"])
> ci <- multinomialCI(table(transition_probs$t2[transition_probs$t1=="A"]),
+                     alpha=0.05)
> 
> result <- data.frame(est_p = est_trans_prob_A, lo_ci=ci[,1], up_ci=ci[,2])
> result
  est_p.Var1 est_p.Freq lo_ci     up_ci
1          A       0.00  0.00 0.6083073
2          B       0.25  0.00 0.8583073
3          C       0.50  0.25 1.0000000
4          D       0.25  0.00 0.8583073
5          E       0.00  0.00 0.6083073

On the other hand, in case you have some previous knowledge about the transition probabilities, you might prefer the CoinMinD package which allows to specify a Dirichlet prior and run a Bayesian analysis using MCMC.