[Math] Markov Chains: Example to show past and future are not independent given any information about past.

The Markov property does not imply that the past and the future are independent given any information concerning the present. Find an example of a homogeneous Markov chain $\left\{X_n\right\}_{n\ge0}$ with state space $E=\left\{1,2,3,4,5,6\right\}$ such that

$P(X_2=6\mid X_1\in\{3,4\},X_0=2)\neq P(X_2=6\mid X_1\in\{3,4\})$.

I don't understand the phrasing of this question. How should I approach this?

Could this work?

$P(X_2=6\mid X_1\in\{3,4\}) = 1$

But $P(X_2=6\mid X_1\in\{3,4\},X_0=2) =0$

Introduction: Natural language represented using Markov model

A very simple and practical example would be that of natural language.

Now first of all, natural language is just a terminology for any language you speak and have acquired or learned, such as English. When you communicate, you have a set of words {$W$} and a set of operations you can perform on these words. You then proceed further to arrange the words that you know in a certain order that produces meaningful elements of communication/semantics -- "sentences".

When you use words sentences, they are obviously not arbitrarily placed words, they have some order. In fact, sentence formation can be well modeled as a Markov chain model with order "$k$"$=1$ (reference).

Sentence formation and conditional probability

Now establish a relation between your possible states $[1, 6]$ and certain six words. It can be experimentally determined that $P(C | B) \geq P(C|A,B)$ where $A,B$ is an ordered occurrence of the words $A$ and $B$ in a sentence. This relation always holds in case of a natural language as $P (C | B)$ already includes all possible trigrams that can be formed as "$X B C$" where $X$ is a general word of the language. In computational linguistics, this is also termed as the transition probability.

I'll give you an example of that as well:
Let let us, however, consider the backward probability $P(A|$"$A,B$"$)$ i.e., the probability that $A$ is the word processing $B$. Now let $A$ be "chocolate" and $B$ be "chip". Now there will exist a certain probability in the historical literature of the language for $P(A|$"$A,B$"$) = p_{k=1}$. However, let us now consider another word $C$, that represents "cookie". When you compute the new probability, $P(A|``A,B,C")=p_{k=2}$, then you find that $p_{k=2} \leq p_{k=1}$, because there are more words that you can say after saying "chocolate chip" other than simply "cookie". I have denoted it as "$\leq$" because theoretically it is possible that only one such meaningful triplet can be formed in a natural language, although that's rarely ever the case. You cohld simply replace is with a strict inequality relation for all practical purposes.

I think this provides a sufficient as well as intuitive example for the problem thay you have stated. Let me know if something from the answer is unclear.

[Math] Markov Chains: Example to show past and future are not independent given any information about past.

Best Answer

Introduction: Natural language represented using Markov model

Sentence formation and conditional probability

Related Question

Best Answer

Introduction: Natural language represented using Markov model

Sentence formation and conditional probability

Related Solutions

[Math] Is a Bernoulli process a Markov chain

Summing two independent Markov Chains on a discrete-time and space environment

Related Question