Solved – Markov property in simple random walk

conditional probabilitymarkov-process

there is something about the random walk I don't understand. Specifically about "The first time when a state in Markov Chain" is reached.

So in the simple random walk, there is the states of: …, -3, -2, -1, 0, 1, 2, 3, … (from negative infinity to positive infinity of all integers), so the states $S={…-4, -3, -2, -1,0, 1, 2, 3, …}$.

Let's $T_s = k$ denote that when we start at zero (i.e s=0), $k$ is the time that we First Reaches state $s$ (so it is the very first time we reaches the state (or in this case more like a position) of $k$ when we start from the position $0$).

Let's denote $f_r(n)=P(T_r=n |X_0 =0)$ to be the probability that we are at state $r$ for the Very First time at time n given we first start at position 0.

So I would like to find out $f_r(n)=P(T_r=n | X_0 = 0)$ where $r>1$.

I was reading something like this, it says


we can condition on $T_1$ and make use of the Markov property to have this:

$f_r(n)=\sum_{k=0}^{\infty}P(T_r=n|T_1=k)f_1(k)$


So my question is how did the author make use of the Markov property in the above equation when he condition on $T_1=k$?

I know that $T_1=k$ means that $X_k=1$ since it means that at time $k$, we are at position 1. And what Markov Property says is this:

$P(X_{n+m} = s | X_0 = i_0 ,…, X_{n-1}=i_{n-1} ) = P(X_{n+m } = s | X_{n-1} = i_{n-1})$

(the $i_0$,$i_1$,$i_{n-1}$ are just some states in $S$).

So Markov property is saying the distribution of the states at a future time point say in this case the time point $n+m$ only depends on the most recent past observation (i.e. only depends on the state at the time n-1 in this case).

I am just wondering how is the Markov Property being implemented in the above equation?

Thank you

Best Answer

Note that by the Law of Total Expectation,

$f_r(n)=P(T_r=n|X_0=0)=\sum_{k=0}^{\infty} P(T_r=n|T_1=k, X_0=0) *P(T_1=k|X_0=0)$

Now, since this is a Markov process, we can say that

$P(T_r=n|T_1=k, X_0=0) = P(T_r=n|T_1=k)$

because conditional expectations of $T_r$ depend only on the most recent information available, (as you quoted in your definition of the markov. property). And we know that $k>=0$, so this will always "take precedence" over conditioning on $X_0$.

Plugging this in and making substitutions, you get the result.

EDIT: To show that the Processes $T_r$ are Markov, I will make use of the concept of a Filtration. The Markov property can alternatively be stated as:

Let $X_t$ be a stochastic process with the Strong Markov Property. Then for all $s \leq t$, $P(X_t|\mathcal{F}_s) = P(X_t|X_s)$ Where $\mathcal{F}_s$ represents a filtration.

The formal definition of a filtration is an increasing set of sigma algrebras on the probability space in question. But what it really means is the set of all information known up to and including the time s. This means that conditioning on a set of previous events is the same as conditioning on the most recent event in that set.

Now, why is it that $X_n$ being strong Markov implies that $T_n$ is also strong Markov? The intuitive explanation is that if you know $T_r=n$ for some state r and time n, then you also know $X_n$. So knowing $X_m$ or anything else about the process for some $m<n$ does not add any new information.

We can prove this formally using the reflection principle. For this argument, I will assume that we are condidering first arrival as points greater than the starting point. The reflection principle in this context states that

$P(T_r=n |\mathcal{F}_t) =P( sup_{s \leq n-1}X_{s} < r \wedge X_n = r|\mathcal{F}_t)= P( X_{n} < r -2|\mathcal{F}_t)$

Now, since the last expression here is simply a statement about $X_n$, and the process $X_n$ is Markov, we know that $P( X_n < r|\mathcal{F}_t)=P( X_n < r|X_t = r^*)$ and thus $P(T_r=n |\mathcal{F}_t)=P(T_r=n |X_t=r^*)$.

In other words, Conditioning $T_r$ on a set of previous information is the same as conditioning on the most recent position of the process. Note that if $T_r$ is actually known (i.e. if $T_r \in \mathcal{F}_t$), this argument is trivially true. Finally, we can say that

$P(T_r=n |T_{r^*}=t) = P(T_r=n |X_t=r^* \wedge X_s<r^* \forall s<t)=P(T_r=n |X_t=r^*)$. This is just a special case of what I just proved, which in english says that the knowledge that $X_t$ was a first arrival is irrelevant.

Altogether, we have

$P(T_r=n |\mathcal{F}_t)=P(T_r=n |T_{r^*}=t)$, for all $r^*$ i.e. the process is Markov. Or more precisely, the set of processes are jointly Markov.

Related Question