Probability – Merging States in Markov Chain

markov chainsmatricesprobability

Consider a Markov process on the state space $S=\{1,2,3\}$ where transitions between states are described by the transition matrix
$$P = \begin{pmatrix} p_{11} & p_{1,2} & 0 \\ p_{2,1} & p_{2,2} & p_{2,3} \\ 0 & p_{3,2} & p_{3,3} \end{pmatrix}$$
where $P$ is a column stochastic matrix. I want to describe the Markov process on the state space $S^\prime = \{1 \ \& \ 2,3\}$ where we are only interested in the transition probabilities between points from the states $1 \ \& \ 2$ to $3$ and vice versa.

In one step, due to how $P$ is defined, state $1$ cannot visit state $3$. However, by looking at the matrix
$$P^2= \begin{pmatrix}p_{1,1}^2+p_{1,2}p_{2,1}&p_{1,1}p_{1,2}+p_{1,2}p_{2,2}&p_{1,2}p_{2,3}\\ p_{1,1}p_{2,1}+p_{2,1}p_{2,2}&p_{1,2}p_{2,1}+p_{2,2}^2+p_{2,3}p_{3,2}&p_{2,2}p_{2,3}+p_{2,3}p_{3,3}\\ p_{2,1}p_{3,2}&p_{2,2}p_{3,2}+p_{3,2}p_{3,3}&p_{2,3}p_{3,2}+p_{3,3}^2\end{pmatrix}$$
we can determine the transition probabilities of points starting in state $1$ and transitioning to $3$ after $2$ iterations. I was wondering if given the above, one could rewrite this process on a state space $S^\prime$. Here the new matrix $P^\prime$ would be a $2\times 2$ matrix depending on the probabilities appearing in the matrix $P$.

Best Answer

Define the process $\left(\widetilde{X}(n)\right)_{n\in\mathbb{N}}$ as $\widetilde{X}(n)=1$, if $X(n)=1$; and $\widetilde{X}(n)=2$, if $X(n)=2$ or $X(n)=3$. This is the "quotient" process described in the question.

Claim. $\left(\widetilde{X}(n)\right)_{n\in\mathbb{N}}$ is not Markov.

Proof. Indeed, on one hand $$\mathbb{P}\left(\widetilde{X}(n)=1\left|\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=1\right.\right)=\mathbb{P}\left(\widetilde{X}(n)=1\left|X(n-1)=2,\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=1\right.\right)$$ $$=\mathbb{P}\left(\widetilde{X}(n)=1\left|X(n-1)=2,\widetilde{X}(n-2)=1\right.\right)$$ $$=\mathbb{P}\left(X(n)=1\left|X(n-1)=2,\widetilde{X}(n-2)=1\right.\right)$$ $$= \mathbb{P}\left(X(n)=1\left|X(n-1)=2\right.\right)= P_{21}.$$ On the other hand, $$\mathbb{P}\left(\widetilde{X}(n)=1\left|\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=2\right.\right)=\mathbb{P}\left(X(n)=1\left|\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=2\right.\right)$$ $$= \frac{\mathbb{P}\left(X(n)=1,\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=2\right)}{\mathbb{P}\left(\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=2\right)}$$ $$= \frac{\mathbb{P}\left(X(n)=1, X(n-1)=2,\widetilde{X}(n-2)=2\right)}{\mathbb{P}\left(\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=2\right)}$$ $$= \frac{\mathbb{P}\left(X(n)=1, X(n-1)=2,\widetilde{X}(n-2)=2\right)}{\mathbb{P}\left(X(n-1)=2,\widetilde{X}(n-2)=2\right)}\times \frac{\mathbb{P}\left(X(n-1)=2,\widetilde{X}(n-2)=2\right)}{\mathbb{P}\left(\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=2\right)}$$ $$=\mathbb{P}\left(X(n)=1\left|X(n-1)=2,\widetilde{X}(n-2)=2\right.\right)\times \frac{\mathbb{P}\left(X(n-1)=2,\widetilde{X}(n-2)=2\right)}{\mathbb{P}\left(\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=2\right)}$$ $$=P_{21}\times \frac{\mathbb{P}\left(X(n-1)=2,\widetilde{X}(n-2)=2\right)}{\mathbb{P}\left(\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=2\right)}<P_{21}.$$ In other words, $$\mathbb{P}\left(\widetilde{X}(n)=1\left|\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=1\right.\right)\neq \mathbb{P}\left(\widetilde{X}(n)=1\left|\widetilde{X}(n-1)=2,\widetilde{X}(n-2)=2\right.\right)$$ and $\left(\widetilde{X}(n)\right)_{n\in\mathbb{N}}$ is not Markov.

Remark 1. Since $\left(\widetilde{X}(n)\right)_{n\in\mathbb{N}}$ is not Markov, you should not expect that $\widetilde{p}^{\top}\widetilde{P}^n$ represents the distribution of $\widetilde{X}(n)$, where $\widetilde{P}$ is the "transition" matrix of $\left(\widetilde{X}(n)\right)_{n\in\mathbb{N}}$.

You can still characterize each of the conditionals. Define $p(n)=p^{\top}P^n$, where $p$ is the initial distribution of the process $\left(X(n)\right)_{n\in\mathbb{N}}$. We have,

$i)\,\, 2\rightarrow 1$ $$\mathbb{P}\left(\widetilde{X}(n)=1\left|\widetilde{X}(n-1)=2\right.\right)=\mathbb{P}\left(X(n)=1\left|\widetilde{X}(n-1)=2\right.\right)=\frac{\mathbb{P}\left(X(n)=1,\widetilde{X}(n-1)=2\right)}{\mathbb{P}\left(\widetilde{X}(n-1)=2\right)}=\frac{\mathbb{P}\left(X(n)=1,X(n-1)=2\right)}{\mathbb{P}\left(X(n-1)=2\right)+\mathbb{P}\left(X(n-1)=3\right)}$$ $$=\frac{\mathbb{P}\left(X(n)=1\left| X(n-1)=2\right.\right)\mathbb{P}\left(X(n-1)=2\right)}{\mathbb{P}\left(X(n-1)=2\right)+\mathbb{P}\left(X(n-1)=3\right)}$$ $$=\frac{P_{21}\left[p(n-1)\right]_2}{\left[p(n-1)\right]_2+\left[p(n-1)\right]_3},$$ where $\left[x\right]_i$ is the $i$th entry of the vector $x$. The other conditionals can be computed in a similar vain, namely,

$ii)\,\, 1\rightarrow 2$

$$\mathbb{P}\left(\widetilde{X}(n)=2\left|\widetilde{X}(n-1)=1\right.\right)=\mathbb{P}\left(\widetilde{X}(n)=2\left|X(n-1)=1\right.\right)$$ $$= \mathbb{P}\left(X(n)=2\vee 3\left|X(n-1)=1\right.\right)$$ $$=\mathbb{P}\left(X(n)=2\left|X(n-1)=1\right.\right)+\underbrace{\mathbb{P}\left(X(n)=3\left|X(n-1)=1\right.\right)}_{=0}$$ $$=P_{12}.$$

$iii) \,\, 1\rightarrow 1$

$$\mathbb{P}\left(\widetilde{X}(n)=1\left|\widetilde{X}(n-1)=1\right.\right)=\mathbb{P}\left(\widetilde{X}(n)=1\left|X(n-1)=1\right.\right)$$ $$= \mathbb{P}\left(X(n)=1\left|X(n-1)=1\right.\right)=P_{11}.$$

$iv) \,\, 2\rightarrow 2$

$$\mathbb{P}\left(\widetilde{X}(n)=2\left|\widetilde{X}(n-1)=2\right.\right)=\frac{\mathbb{P}\left(\widetilde{X}(n)=2,\widetilde{X}(n-1)=2\right)}{\mathbb{P}\left(\widetilde{X}(n-1)=2\right)}$$ $$=\frac{\mathbb{P}\left(X(n)=2\vee 3,X(n-1)=2\vee 3\right)}{\mathbb{P}\left(X(n-1)=2\vee 3\right)}$$ $$=\frac{\mathbb{P}\left(X(n)=2,X(n-1)=2\right)+\mathbb{P}\left(X(n)=2,X(n-1)=3\right)}{\mathbb{P}\left(X(n-1)=2\right)+\mathbb{P}\left(X(n-1)=3\right)}+\frac{\mathbb{P}\left(X(n)=3,X(n-1)=2\right)+\mathbb{P}\left(X(n)=3,X(n-1)=3\right)}{\mathbb{P}\left(X(n-1)=2\right)+\mathbb{P}\left(X(n-1)=3\right)}$$

$$=\frac{P_{22}\left[p(n-1)\right]_2+P_{32}\left[p(n-1)\right]_3+P_{23}\left[p(n-1)\right]_2+P_{33}\left[p(n-1)\right]_3}{\left[p(n-1)\right]_2+\left[p(n-1)\right]_3}.$$

Remark 2. As a general rule, clumping together states yields a lower-dimensional system (quite convinient) on one hand, but at the expense of loosing the Markov property. In general, this zooming out to the macroscopics yields a non-Markov macro-process. Great as far as scalability/dimensionality goes, but at the expense of loosing some dynamical structure.

Related Question