Solved – Markov chains that do not contain all the states in the model

markov chainmarkov-processmixture-distributionstochastic-processes

I am trying to understand Mixture Markov Models in order to cluster a set of sequences that do not necessarily all have the same states occurring in them.

If I have several sequences that I am trying to fit a Mixture Markov Model to, how do I go about the fact that some of the chains contain a part of all possible states while the others might contain other states? For example:

sequence 1: 1 2 4 2 4 1

sequence 2: 3 2 3 3 3 3

sequence 3: 1 2 3 1 1 2

How would I define a transition probability matrix for each of the sequences, given that the set of possible states is {1,2,3,4}? In the context of Mixture Markov Modeling, shouldn't a transition probability matrix be 4×4 and contain transfer probabilities for each state from the collection -> each state of the collection?

Take for instance sequence1:

My understanding is that the TPM of sequence1 should be:

   1   2   3   4

1  0.5 0.5 0   0

2  1   0   0   0

3  0   0   0   0

4  0.5 0.5 0   0

This seems to be incorrect, since TPM is a stochastic matrix whose rows sum up to 1, and the third row is just all 0s. Should I completely remove state 3 if I am building a transition probability matrix of a specific sequence?

The reason I need to have a Markov chain for each sequence is that I am trying to calculate log-likelihood distance between each two sequences in order to build a distance matrix and get medoids for initializing the Mixture Markov Model.

The formula for distance that I am trying to use is:

$D(seq_i,seq_j)=1/2* log(p(seq_i|v_j)+p(seq_j|v_i))$, where

$v_i,v_j$ are the Markov Chains representing the sequences $seq_i$ and $seq_j$ correspondingly.

How would I define sequences 1,2,3 in terms of their transition probability matrices and single ML-estimated Markov chains representing each sequence?

Best Answer

Just to summarize, if you substitute your 0 values with some very small value such as 1e-17, your impossible states will still be the furthers apart. As you are trying to derive some sort of ordering of your data, it doesn't matter whether your smallest value is -Inf or -39, as either number will be the smallest value in the dataset.