Solved – What are the differences between hidden Markov models and neural networks

algorithmsdata miningmarkov-processneural networks

I'm just getting my feet wet in statistics so I'm sorry if this question does not make sense. I have used Markov models to predict hidden states (unfair casinos, dice rolls, etc.) and neural networks to study users clicks on a search engine. Both had hidden states that we were trying to figure out using observations.

To my understanding they both predict hidden states, so I'm wondering when would one use Markov models over neural networks? Are they just different approaches to similar problems?

(I'm interested in learning but I also have another motivation, I have a problem that I'm trying to solve using hidden Markov models but its driving me bonkers so I was interested in seeing if I can switch to using something else.)

Best Answer

What is hidden and what is observed

The thing that is hidden in a hidden Markov model is the same as the thing that is hidden in a discrete mixture model, so for clarity, forget about the hidden state's dynamics and stick with a finite mixture model as an example. The 'state' in this model is the identity of the component that caused each observation. In this class of model such causes are never observed, so 'hidden cause' is translated statistically into the claim that the observed data have marginal dependencies which are removed when the source component is known. And the source components are estimated to be whatever makes this statistical relationship true.

The thing that is hidden in a feedforward multilayer neural network with sigmoid middle units is the states of those units, not the outputs which are the target of inference. When the output of the network is a classification, i.e., a probability distribution over possible output categories, these hidden units values define a space within which categories are separable. The trick in learning such a model is to make a hidden space (by adjusting the mapping out of the input units) within which the problem is linear. Consequently, non-linear decision boundaries are possible from the system as a whole.

Generative versus discriminative

The mixture model (and HMM) is a model of the data generating process, sometimes called a likelihood or 'forward model'. When coupled with some assumptions about the prior probabilities of each state you can infer a distribution over possible values of the hidden state using Bayes theorem (a generative approach). Note that, while called a 'prior', both the prior and the parameters in the likelihood are usually learned from data.

In contrast to the mixture model (and HMM) the neural network learns a posterior distribution over the output categories directly (a discriminative approach). This is possible because the output values were observed during estimation. And since they were observed, it is not necessary to construct a posterior distribution from a prior and a specific model for the likelihood such as a mixture. The posterior is learnt directly from data, which is more efficient and less model dependent.

Mix and match

To make things more confusing, these approaches can be mixed together, e.g. when mixture model (or HMM) state is sometimes actually observed. When that is true, and in some other circumstances not relevant here, it is possible to train discriminatively in an otherwise generative model. Similarly it is possible to replace the mixture model mapping of an HMM with a more flexible forward model, e.g., a neural network.

The questions

So it's not quite true that both models predict hidden state. HMMs can be used to predict hidden state, albeit only of the kind that the forward model is expecting. Neural networks can be used to predict a not yet observed state, e.g. future states for which predictors are available. This sort of state is not hidden in principle, it just hasn't been observed yet.

When would you use one rather than the other? Well, neural networks make rather awkward time series models in my experience. They also assume you have observed output. HMMs don't but you don't really have any control of what the hidden state actually is. Nevertheless they are proper time series models.