I have a time series made up of an unknown number of hidden states. Each state contains a set of values unique to that state. I am trying to use a GMM HMM (as implemented in Python's hmmlearn
package) to identify these hidden states (so I'm effectively clustering a time series). This seems to work reasonably well when I know the number of hidden states (K) to look for, but the whole point is that I don't know the number of states and need to find this out before applying the model.
I've been trying out a range of values for K and getting the log-likelihood for each. Then, I've been using np.polyfit
to fit a curve to this data and find a maximum.
This all feels a bit clumsy though and I'm wondering if there is a better way to get the best K value (especially as I have to guess a sensible range to test K for which could lead to a lot of models being built)? I'm very new to HMMs in general so if anyone can point me in the direction of some good introductory guides that could help me understand this problem better then that would also be much appreciated.
Best Answer
Have you tried using AIC (Akaike Information criterion) or BIC (Bayesian Information Criterion)?
It's possible to implement AIC or BIC to work with hmmlearn. Here is my implementation for GaussianHMM for covariance_type='diag'. If the covariance_type changes then the number of parameters will have to be adjusted for covars_. You can extend it to GMMHMM if you know the number for components of the GMM.
Similar question. Number of parameters in Markov model