Solved – Calculate the probability for a sequence generated by a graph

markov-processprobability

I collected some sequences of events, e.g.

a,b,c
a,a,a,c,a,b,b
c,b,b,c,a
b,c,a
a,b,c,a

Each event has a certain probability to create the next event, but later events do not depend on other events than the one before, e.g. the graph that can be constructed from the data has the markov property.

From this data a transition matrix can be calculated:

$$
P = \left( \begin{array}{ccc}
0.33 & 0.5 & 0.16\\
0 & 0.33 & 0.66 \\
0.8 & 0.2 & 0 \end{array} \right)
$$

(if i did not miscalculated anything…)

When I now get a new sequence, is it possible to calculate the probability that this sequence is similar to previous sequences?

Can I just multiply the transition probabilities for a new sequence? For example a,b,b would give me $0.5 \cdot 0.33$?

Best Answer

It seems like you're asking two questions here --

Question 2 (The simpler one): Assuming a Markov chain, is the probability of a sequence x,y,z equal to the product of the transition probabilities i.e. p(y|x)*p(z|y)? The answer is yes.

Question 1: How can you tell if the sequences you generate are similar to the sequences you started with? Or in other words, is a Markov chain really a good model for the data you have? This is not quite as simple.

The way I would do it is to generate a large number of sequences from your transition matrix, find the probability of each one, and then plot a histogram of those probabilities. Then take the sequences from your original data, find the probabilities of THOSE (in your transition matrix) and see where they fall, relative to your histogram. (Make sure you're only comparing sequences of the same length, in a given test, because different-length sequences will very different probabilities.) Doing this, you can see if the simulated sequences look "similar" (in one way) to the original sequences, and actually set up the comparison as a test. If your data falls outside of 95% of the histogram, you can reject the hypothesis that a Markov model is a good model for the original sequences.

At least I think you can -- this seems intuitively right to me, although I also have the vague suspicion I'm abusing the concepts.

Related Question