Solved – Optimal number of HMM states using AIC

aichidden markov modellikelihood

So, I have seen many questions here asking whether it is a good idea to use AIC/BIC for determining the optimal number of hidden states for an HMM. What about the number of observable states though?

In my case, I am using a discrete HMM, where I quantised the continuous time-series observation signal to obtain a sequence of discrete emissions. I train a number of HMMs and then use the AIC to find the "best" one. Each HMM has a different number of hidden states (3 to 9) and a different number of values an observation can possibly be assigned after the quantisation (4 to 128).

When I use AIC, it barely makes a difference as for large number of states, the log-likelihood is too low (-2000) to compare to the punishment the AIC induces because of the free parameters. Also, the log-likelihood is always better (around -300) for low number of states.

Does after this make sense to use AIC, or should I just compare the different models (with different number of free parameters) only using the log-likelihood?

Best Answer

AIC is based on likelihood, and the likelihood has to be calculated for the same set of observations. If you have underlying data $x$ and later categorize it in two different ways to form $y=f(x)$ and $z=g(x)$ (both of which are based on the same $x$, but are measured on two different relatively coarse scales due to two different categorizations), then you cannot directly compare the likelihoods - nor AIC - of $y$ and $z$.