Solved – HMM with unknown number of hidden states

hidden markov modelhyperparameterrecurrent neural network

The Baum-Welch algorithm can estimate the parameters for a given structure of a HMM. However, how to determine the structure, specifically the number of hidden states?

Ideas so far:

  • Trial and error, examination of the quality of fit, then increasing or decreasing the number, adopting the CV.
  • Use RNN, export the values in the memory. Cluster them, e.g. via DBSCAN

Thank you for any help/reference.

Best Answer

For the number of hidden states, many estimation methods rely on trial/error, computing a score (like the BIC, AIC, or MML) and decide the final topology from these scores.

The drawback of these scores in my experience is that they all involve the computation of a likelihood with requires a trained model and thus which makes the topology design quite expensive.

In my experience, I have often used a simple k-means algorithm (or any other clustering algorithm) to cluster the training data between say k=2 and 20 clusters. If the training data is too large, a subset can be used. Then I computed the percentage of variance explained for each k and look at the curve.

The percentage of variance explained $p$ is computed as:

$p = 100 \times \cfrac{d_{total} - d_{within}}{d_{total}} \; ,$

where $d_{tota}$ is the sum of the Euclidean distances of all feature vectors (data) to all the clusters centroids and $d_{within}$ the distance of all feature vectors to their closest centroid.

The beginning of the curve usually has a steep slope before reaching a plateau. On the plateau, adding clusters is a computationally expensive operation for the gain in discriminability. You can set a threshold (95% of variance explained for instance) and select the number of states equal to the number of clusters you found.

This approach is used in this publication: Proportional data modeling with hidden Markov models based on generalized Dirichlet and Beta-Liouville mixtures applied to anomaly detection in public areas, by Epaillard and Bouguila.

Other than this, some models do not make any assumption on the number of states for instance: the infinite HMMs. Here is one reference: The infinite Markov model, by Beal et al. The authors of this paper have more publications on the topic.

Another approach (maybe more linked to a specific application though) is presented in this publication: Estimation of the number of states for gesture recognition with Hidden Markov Models based on the number of critical points in time sequence, by Cholewa and GÅ‚omb.