Solved – Significance testing for Markov chain transition probabilities

hypothesis testingmarkov-processp-valuestatistical significancetransition matrix

The question

How can I calculate p values for individual transitions in a Markov chain? I want to test the null hypothesis that the probability of entering state $B$ from previous state $A$ is less than or equal to the overall probability of being in state $B$.

Details

I have an observed sequence of discrete states and am fitting a first order, stationary Markov chain. I'd like to calculate a matrix of p values (one for each possible transition). Say the state at time $t$ is $S_t$. Given states $A$ and $B$, I want to test the null hypothesis that $P(S_t=B \mid S_{t-1}=A) \le P(S_t=B)$ against the alternative hypothesis that $P(S_t=B \mid S_{t-1}=A) > P(S_t=B)$.

There are a couple hundred states. The 'true' transition matrix of the data-generating process is very sparse, meaning that each state can transition to only a few other states. All states transition to themselves with high probability, but there are no absorbing states. The data contain ~10,000 time points, but some transitions may only be observed several times. So, I'm probably not in the asymptotic regime.

The literature

The closest method I've found is described in:

Vautard et al. (1990). Statistical significance test for transition matrices of atmospheric Markov chains.

They randomly permute the sequence of observed states and estimate transition probabilities from the shuffled data. For each pair of states $(A, B)$, they calculate a p value as the fraction of shuffles for which the estimated $A \rightarrow B$ transition probability exceeds that of the original data. My hesitation about this method is that 1) They don't clearly state a null hypothesis. 2) The permutation destroys temporal dependence between all states. But, I can imagine many more models where some states exhibit temporal dependence and others don't. The method doesn't seem to be counting these cases, but I don't know whether it matters.

Another paper:

Anderson and Goodman (1957). Statistical inference about Markov chains.

They give a $\chi^2$ test for the hypothesis that particular transition probabilities have particular values. Maybe I could use this to compare $P(S_t=B \mid S_{t-1}=A)$ to the estimated marginal probability of state $B$, but this doesn't seem to take into account the uncertainty of estimating $P(S_t=B)$ from the data.

Best Answer

I had the same problem, you can take into account the uncertainty of estimating $P(S_t=B)$ from the data by doing a two sample $\chi^2$ test for the hypothesis test. Thus instead of $P(S_t=B)$, you use it as a sample (number times being in B and the total number of events in you chain)

Related Question