[Math] Limit theorem of Markov chains applied to higher order Markov chains

markov chains

I have a second order Markov chain with 4 states {A,T,C,G} (the 4 DNA nucleotides).

the transition matrix looks like this:

    A    T    C    G
AA[0.1, 0.6, 0.2, 0.1]
AT[0.3, 0.1, 0.5, 0.1]
AC[0.5, 0.3,  0,  0.2]
AG[..., ..., ..., ...]
TA[..., ..., ..., ...]
TT[..., ..., ..., ...]
TC[..., ..., ..., ...]
TG[..., ..., ..., ...]
CA[..., ..., ..., ...]
CT[..., ..., ..., ...]
CG[..., ..., ..., ...]
GA[..., ..., ..., ...]
GT[..., ..., ..., ...]
GC[..., ..., ..., ...]
GG[..., ..., ..., ...]

I wanted to calculate the stationary probability vector for the 4 states to which this matrix converges. The Markov chain is regular.

In case of first order Markov chains this is easily done by calculating the limit of $P^n$ with $n\rightarrow \infty$.

I do not know how to approach the problem in case of second order Markov chains.

Also, having a limited dataset from which to determine the transition matrix, can I consider the stationary distribution of the 4 nucleotides as being the theoretical distribution I would have if I had a much larger pool from which to draw (with the same transition matrix)?

In other words, can I consider the stationary distribution like an estimation of the theoretical nucleotide frequency given the transition matrix obtained from limited data?

Best Answer

A second order Markov chain is a random process $(X_n)_n$ on an alphabet $A$, whose distribution is specified by its transition probabilities $Q(x\mid y,z)=P[X_n=x\mid X_{n-1}=y,X_{n-2}=z]$, for every $(x,y,z)$ in $A\times A\times A$ (and by an initial distribution on $A\times A$). A stationary distribution of $(X_n)$ is a probability measure $\pi$ on $A\times A$ such that, if $\pi(x,y)=P[X_n=x,X_{n-1}=y]$ for every $(x,y)$ in $A\times A$ and some $n$, then $\pi(x,y)=P[X_{n+1}=x,X_{n}=y]$ for every $(x,y)$ in $A\times A$.

Thus, one asks that, for every $(x,y)$ in $A\times A$, $$ \pi(x,y)=\sum_{z\in A}Q(x\mid y,z)\pi(y,z). $$ As in the first order case, this linear system, together with the normalizing condition $$ \sum_{(x,y)\in A\times A}\pi(x,y)=1, $$ fully determines $\pi$ as soon as $(X_n)_n$ is irreducible. A new feature, absent of the first order case, is that every stationary distribution $\pi$ has identical marginals, that is, for every $x$ in $A$, $$ \varrho(x)=\sum_{y\in A}\pi(x,y)=\sum_{y\in A}\pi(y,x). $$ Finally, the MLE of $\pi$ based on $(X_k)_{0\leqslant k\leqslant n}$ is $\hat\pi_n$ defined by $$ \hat\pi_n(x,y)=\frac1n\sum_{k=1}^n\mathbf 1_{X_k=x,X_{k-1}=y}. $$ The MLE is consistent, that is, $\hat\pi_n(x,y)\to\pi(x,y)$ almost surely, for every $(x,y)$ in $A\times A$, when $n\to\infty$. In particular, the frequency of $x$ in $A$ stabilizes, since $$ \frac1n\sum_{k=1}^n\mathbf 1_{X_k=x}=\sum_{y\in A}\hat\pi_n(x,y)\to\varrho(x). $$

Related Question