Solved – How to infer the number of states in a Hidden Markov Model with Gaussian mixture emissions

I have a time series made up of an unknown number of hidden states. Each state contains a set of values unique to that state. I am trying to use a GMM HMM (as implemented in Python's hmmlearn package) to identify these hidden states (so I'm effectively clustering a time series). This seems to work reasonably well when I know the number of hidden states (K) to look for, but the whole point is that I don't know the number of states and need to find this out before applying the model.

I've been trying out a range of values for K and getting the log-likelihood for each. Then, I've been using np.polyfit to fit a curve to this data and find a maximum.

This all feels a bit clumsy though and I'm wondering if there is a better way to get the best K value (especially as I have to guess a sensible range to test K for which could lead to a lot of models being built)? I'm very new to HMMs in general so if anyone can point me in the direction of some good introductory guides that could help me understand this problem better then that would also be much appreciated.

import hmmlearn as hmm import numpy as np def bic_general(likelihood_fn, k, X): """likelihood_fn: Function. Should take as input X and give out the log likelihood of the data under the fitted model. k - int. Number of parameters in the model. The parameter that we are trying to optimize. For HMM it is number of states. For GMM the number of components. X - array. Data that been fitted upon. """ bic = np.log(len(X))*k - 2*likelihood_fn(X) return bic def bic_hmmlearn(X): lowest_bic = np.infty bic = [] n_states_range = range(1,7) for n_components in n_states_range: hmm_curr = hmm.GaussianHMM(n_components=n_components, covariance_type='diag') hmm_curr.fit(X) # Calculate number of free parameters # free_parameters = for_means + for_covars + for_transmat + for_startprob # for_means & for_covars = n_features*n_components n_features = hmm_curr.n_features free_parameters = 2*(n_components*n_features) + n_components*(n_components-1) + (n_components-1) bic_curr = bic_general(hmm_curr.score, free_parameters, X) bic.append(bic_curr) if bic_curr < lowest_bic: lowest_bic = bic_curr best_hmm = hmm_curr return (best_hmm, bic) best_hmm, bic = bic_hmmlearn(X)

Best Answer

Have you tried using AIC (Akaike Information criterion) or BIC (Bayesian Information Criterion)?

It's possible to implement AIC or BIC to work with hmmlearn. Here is my implementation for GaussianHMM for covariance_type='diag'. If the covariance_type changes then the number of parameters will have to be adjusted for covars_. You can extend it to GMMHMM if you know the number for components of the GMM.

Similar question. Number of parameters in Markov model

Best Answer

Related Solutions

Solved – Hidden markov models: output observations defined by a (non-hidden) markov model

Solved – Hidden Markov Models with multiple emissions per state

Related Question