Solved – Criteria for selecting the “best” model in a Hidden Markov Model

aicbichidden markov modelr

I have a time series data set to which I am trying to fit a Hidden Markov Model (HMM) in order to estimate the number of latent states in the data. My pseudo code for doing this is the following:

for( i in 2 : max_number_of_states ){ 
    ...
    calculate HMM with i states
    ...
    optimal_number_of_states = "model with smallest BIC"
    ...
}

Now, in the usual regression models the BIC tends to favor the most parsimonious models but in the case of the HMM I am not sure that is what it is doing. Does anyone actually know what kind of HMM's the BIC criterion tends toward? I also am able to obtain the AIC and likelihood value as well. Since I am trying to infer the true total number of states, is one of these criteria "better" than the other for this purpose?

Best Answer

I'm assuming here that your output variable is categorical, though that may not be the case. Typically though, when I've seen HMM's used, the number of states is known in advance rather than selected through tuning. Usually they correspond to some well-understood variable that happens to not be observed. But that doesn't mean you can't experiment with it.

The danger in using BIC (and AIC) though is that the k value for the number of free parameters in the model increases quadratically with the number of states because you have the transition probability matrix with Px(P-1) parameters (for P states) and the output probabilities for each category of the output given each state. So if the AIC and BIC are being calculated properly, the k should be going up fast.

If you have enough data, I would recommend a softer method of tuning the number of states like testing on a holdout sample. You might also want to just look at the likelihood statistic and visually see at what point it plateaus. Also if your data is large, keep in mind that this will push the BIC to a smaller model.