Solved – Estimating Markov transition probabilities from sequence data

markov-processMATLABr

I have a full set of sequences (432 observations to be precise) of 4 states $A-D$: eg

$$Y=\left(\begin{array}{c c c c c c c}
A& C& D&D & B & A &C\\
B& A& A&C & A&- &-\\
\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\
B& C& A&D & A & B & A\\
\end{array}\right)$$

EDIT: The observation sequences are of unequal lengths! Does this change anything?

Is there a way of calculating the transition matrix $$P_{ij}(Y_{t}=j|Y_{t-1}=i)$$ in Matlab or R or similar? I think the HMM package might help. Any thoughts?

eg: Estimating Markov chain probabilities

Best Answer

Please, check the comments above. Here is a quick implementation in R.

x <- c(1,2,1,1,3,4,4,1,2,4,1,4,3,4,4,4,3,1,3,2,3,3,3,4,2,2,3)
p <- matrix(nrow = 4, ncol = 4, 0)
for (t in 1:(length(x) - 1)) p[x[t], x[t + 1]] <- p[x[t], x[t + 1]] + 1
for (i in 1:4) p[i, ] <- p[i, ] / sum(p[i, ])

Results:

> p
          [,1]      [,2]      [,3]      [,4]
[1,] 0.1666667 0.3333333 0.3333333 0.1666667
[2,] 0.2000000 0.2000000 0.4000000 0.2000000
[3,] 0.1428571 0.1428571 0.2857143 0.4285714
[4,] 0.2500000 0.1250000 0.2500000 0.3750000

A (probably dumb) implementation in MATLAB (which I have never used, so I don't know if this is going to work. I've just googled "declare vector matrix MATLAB" to get the syntax):

x = [ 1, 2, 1, 1, 3, 4, 4, 1, 2, 4, 1, 4, 3, 4, 4, 4, 3, 1, 3, 2, 3, 3, 3, 4, 2, 2, 3 ]
n = length(x) - 1
p = zeros(4,4)
for t = 1:n
  p(x(t), x(t + 1)) = p(x(t), x(t + 1)) + 1
end
for i = 1:4
  p(i, :) = p(i, :) / sum(p(i, :))
end