Solved – Baum Welch training of HMM

baum-welchhidden markov modelmachine learning

I have 200k sequences and each element of the sequence is vector of length 200. I plan to learn a HMM using this data, using the Baum-Welch EM algorithm to infer transition and emission probabilities. I wanted to know if I can do the fitting in batches of sequences(i.e. learn a HMM from say 1000 sequences first, and then train this HMM with the next 1000 sequences and so on). Is this right? Why should I/should I not do this? How does this compare with fitting a HMM with all 200k sequences at once?

Best Answer

I wouldn't recommend the batch method you suggested, since the final trained HMM will mainly reflect the final 1000 sequences. The influence of the remainder of the sequences will be limited, contributing only to the starting parameters of the model prior to the last model-training step. That said, I realize Baum-Welch training for large sequence sets such as this can be rather slow. You may wish to consider learning an initial model with the faster Viterbi training method (a.k.a segmental K-means algorithm; see Juang & Rabiner (1990) IEEE Transactions on Acoustics... 38, 1639-1641), and then refining the parameters further with Baum-Welch. Alternatively you could just use Viterbi training on its own; I have found this method often yields models of comparable quality to those trained with the Baum Welch algorithm.

Related Question