Markov Process – Checking Memoryless Property in Markov Chains

markov-process

I suspect that a series of observed sequences are a Markov chain…

$$X=\left(\begin{array}{c c c c c c c}
A& C& D&D & B & A &C\\
B& A& A&C & A&D &A\\
\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\
B& C& A&D & A & B & E\\
\end{array}\right)$$

However how could I check that they indeed respect the memoryless property of $$P(X_i=x_i|X_j=x_j)?$$

Or at the very least prove that they are Markov in nature? Note these are empirically observed sequences. Any thoughts?

EDIT

Just to add, the aim is to compare a predicted set of sequence from the observed ones. So we'd appreciate comments on as to how best to compare these.

First Order Transition matrix $$M_{ij}=\displaystyle \frac{x_ij}{\sum^mx_{ik}}$$ where m=A..E states

$$
M=\left(\begin{array}{c c c c c c c}
0.1834& 0.3077 & 0.0769& 0.1479 & 0.2840\\
0.4697& 0.1136 & 0.0076 & 0.2500 & 0.1591\\
0.1827& 0.2404& 0.2212 & 0.1923 & 0.1635\\
0.2378 & 0.1818& 0.0629& 0.3357 & 0.1818\\
0.2458 & 0.1788& 0.1173 & 0.1788 & 0.2793\end{array}\right)$$

Eigenvalues of M
$$E =\left(\begin{array}{c c c c c c c}
1.0000 & 0 & 0 & 0 & 0 \\
0 & -0.2283 & 0 & 0 & 0 \\
0 & 0 & 0.1344 & 0 & 0\\
0 & 0 & 0 & 0.1136 – 0.0430i & 0 \\
0 & 0 & 0 & 0 & 0.1136 + 0.0430i\\
\end{array}\right)$$

Eigenvectors of M
$$V =\left(\begin{array}{c c c c c c c}
0.4472& -0.5852 & -0.4219 & -0.2343 – 0.0421i & -0.2343 + 0.0421i\\
0.4472 & 0.7838 & -0.4211 & -0.4479 – 0.2723i & -0.4479 + 0.2723i\\
0.4472 & -0.2006 & 0.3725 & 0.6323 & 0.6323 \\
0.4472 & -0.0010 & 0.7089 & 0.2123 – 0.0908i & 0.2123 + 0.0908i\\
0.4472 & 0.0540 & 0.0589 & 0.2546 + 0.3881i & 0.2546 – 0.3881i\\
\end{array}\right)$$

Best Answer

I wonder if the following would give a valid Pearson $\chi^2$ test for proportions as follows.

  1. Estimate the one-step transition probabilities -- you've done that.
  2. Obtain the two-step model probabilities: $$ \hat p_{U,V} = {\rm Prob}[X_{i+2}=U|X_i=V] = \sum_{W\in\{A,B,C,D\}} {\rm Prob}[X_{i+2}=U|X_{i+1}=W]{\rm Prob}[X_{i+1}=W|X_i=V] $$
  3. Obtain the two-step empirical probabilities $$\tilde p_{U,V} = \frac{\sum_i \# X_i = V, X_{i+2} = U}{\sum_i \# X_i = V}$$
  4. Form Pearson test statistic $$T_V = \# \{X_i = V\} \sum_U \frac{(\hat p_{U,V} - \tilde p_{U,V})^2}{\hat p_{U,V}}, \quad T=T_A + T_B + T_C + T_D$$

It is tempting for me to think that each $T_U \sim \chi^2_3$, so that the total $T\sim \chi^2_{12}$. However, I am not entirely sure of that, and would appreciate your thoughts on this. I am not likewise not co sertain about whether one needs to be paranoid about independence, and would want to split the sample in halves to estimate $\hat p$ and $\bar p$.