[Math] Markov Chains: Example to show past and future are not independent given any information about past.

markov chainsprobabilitystochastic-processes

The Markov property does not imply that the past and the future are independent given any information concerning the present. Find an example of a homogeneous Markov chain $\left\{X_n\right\}_{n\ge0}$ with state space $E=\left\{1,2,3,4,5,6\right\}$ such that

$P(X_2=6\mid X_1\in\{3,4\},X_0=2)\neq P(X_2=6\mid X_1\in\{3,4\})$.

I don't understand the phrasing of this question. How should I approach this?

Could this work? enter image description here

$P(X_2=6\mid X_1\in\{3,4\}) = 1$

But $P(X_2=6\mid X_1\in\{3,4\},X_0=2) =0$

Best Answer

Introduction: Natural language represented using Markov model

A very simple and practical example would be that of natural language.

Now first of all, natural language is just a terminology for any language you speak and have acquired or learned, such as English. When you communicate, you have a set of words {$W$} and a set of operations you can perform on these words. You then proceed further to arrange the words that you know in a certain order that produces meaningful elements of communication/semantics -- "sentences".

When you use words sentences, they are obviously not arbitrarily placed words, they have some order. In fact, sentence formation can be well modeled as a Markov chain model with order "$k$"$=1$ (reference).


Sentence formation and conditional probability

Now establish a relation between your possible states $[1, 6]$ and certain six words. It can be experimentally determined that $P(C | B) \geq P(C|A,B)$ where $A,B$ is an ordered occurrence of the words $A$ and $B$ in a sentence. This relation always holds in case of a natural language as $P (C | B)$ already includes all possible trigrams that can be formed as "$X B C$" where $X$ is a general word of the language. In computational linguistics, this is also termed as the transition probability.

I'll give you an example of that as well:
Let let us, however, consider the backward probability $P(A|$"$A,B$"$)$ i.e., the probability that $A$ is the word processing $B$. Now let $A$ be "chocolate" and $B$ be "chip". Now there will exist a certain probability in the historical literature of the language for $P(A|$"$A,B$"$) = p_{k=1}$. However, let us now consider another word $C$, that represents "cookie". When you compute the new probability, $P(A|``A,B,C")=p_{k=2}$, then you find that $p_{k=2} \leq p_{k=1}$, because there are more words that you can say after saying "chocolate chip" other than simply "cookie". I have denoted it as "$\leq$" because theoretically it is possible that only one such meaningful triplet can be formed in a natural language, although that's rarely ever the case. You cohld simply replace is with a strict inequality relation for all practical purposes.


I think this provides a sufficient as well as intuitive example for the problem thay you have stated. Let me know if something from the answer is unclear.