Infinitesimal definition of a continuous-time Markov chain

definitioninfinitesimalsmarkov chains

Citing from this Wikipedia article, a continuous-time Markov chain can be described in the following manner

Let $X_t$ be the random variable describing the state of the process at time $t$, and assume the process is in a state $i$ at time $t$.
Then, knowing $X_t=i$, $X_{t+h}=j$ is independent of previous values $\left( X_s : s<t \right)$, and as $h$ → 0 for all $j$ and for all $t$,

$\Pr(X(t+h) = j \mid X(t) = i) = \delta_{ij} + q_{ij}h + o(h)$,

where $\delta_{ij}$ is the Kronecker delta, using the little-o notation.
The $q_{ij}$ can be seen as measuring how quickly the transition from $i$ to $j$ happens.

Things I think I understood

  • $o(h)$ encloses the probability of the events whose probability is a lower order than that of a single jump (e.g. thinking birth-death processes, two simultaneous births/death)
  • the Kronecker is one only when calculating the probability that the process stays in the same state
  • $q_{ij}$ is the transition rate of the process from $i$ to $j$.

What I don't understand is how this formula is derived. More specifically, what confuses me is the fact that, correct me if I'm wrong, $q_{ij}h$ represents the transition probability from state $i$ to $j$.

Furthermore, the page reads

The continuous time Markov chain is characterized by the transition rates, the derivatives with respect to time of the transition probabilities between states $i$ and $j$.

Does that imply $q_{ij}(t)=\frac{d}{dt} p_{ij}(t)$? If yes, how does it correlate to the formula in the infinitesimal definition of the Markov chain?

Best Answer

I am assuming that the discussion is about time-homogeneous Markov chains: in any case, let me consider that case, since it is easier and the general case does not really add much to the discussion.

Let $$p_{i,j}(h) = \Pr(X(t+h)=j \mid X(t)=i).$$ Since we have assumed that the chain is time-homogeneous, this quantity does not depend on $t$. This is the transition probability of going from state $i$ to state $j$ after time $h$. Note that trivially $p_{i,j}(0) = \delta_{i,j}$.

The transition rate $q_{i,j}$ can be defined as the derivative at zero of this function: $q_{i,j} := \frac{d}{dh} p_{i,j}(h)|_{h=0}$. Note that one of the equivalent ways of defining the derivative at zero is using the $o(h)$ notation: $q_{i,j}$ is the only constant you can multiply $h$ by which makes the following true $$ p_{i,j}(h) = p_{i,j}(0) + q_{i,j}h + o(h).$$

One has to be careful now in that $q_{i,j}$ is expressing the transition rate. In general the function $p_{i,j}(h)$ will not be linear, so it is not true that $q_{i,j} h$ is the transition probability after time $h$, which as we said is given by $p_{i,j}(h)$. The reason is that during that interval of time, if we had transitioned from $i$ to $j$ we could as well have transitioned away to some other state, or we could have transitioned first to some third state and then to $j$, so $p_{i,j}(h)$ could be smaller or larger than $q_{i,j}h$.

What it is true is that, if we know all the transition rates $q_{i,j}$ for all $i$ and $j$, we can recover $p_{i,j}(h)$. If we now denote by $Q=(q_{i,j})_{i,j}$ the matrix of transition rates, and by $P(h) = (p_{i,j}(h))_{i,j}$ the time-dependent matrix of transition probabilities, then this is the solution of the first-order differential equation $$ P'(h) = QP(h) $$ with initial condition $P(0) = (\delta_{i,j})_{i,j}$ is the identity matrix.

Related Question