Solved – Hidden Markov Models with multiple emissions per state

expectation-maximizationhidden markov modelunsupervised learning

I want to use Hidden Markov Models for an unsupervised sequence tagging problem. Due to the peculiarities of my application domain (recognition of dialogue acts in conversations), I would like to use multiple emissions for each state (that is, multiple features). Graphically, the model would therefore look like this:
HMM with multiple observations

Both the hidden states and the observation variables are discrete. The emissions probabilities $P(O_{ij} \ | \ S_i)$ are assumed to be independent and modelled via standard categorical distributions.

My question is the following: are there any publicly available toolkits or algorithm that would allow me to learn the parameters of such type of multiple-emissions HMM through a variant of Baum-Welch? From what I could gather, it seems that the only type of multiple emissions supported by classical HMM toolkits are multivariate Gaussians, but I could not find anything about independent categorical distributions of the type above.

Of course, I am aware I could "bypass" the problem by considering each observation to be a vector of values (with each dimension in this vector corresponding to a particular feature) and estimating emission probabilities on this vector space through classical Baum-Welch, but that would introduce a lot of unnecessary data sparsity.

Does anybody have a suggestion to solve this issue? I'm sure I'm not the first person that tried to apply HMMs for unsupervised learning with multiple features! (or maybe I should use another type of model? I considered using CRFs as well, but they seem tricker to apply to unsupervised learning problems).

Best Answer

One simple approach to deal with the sparsity of the observation distribution is to model it as a naive Bayes model. You can still use Baum-Welch with a little modification.

Related Question