Solved – Hidden Markov models and anomaly detection

hidden markov modeltime series

In Shane's answer to this question he suggests that Hidden Markov Models can be used more successfully than wavelets for anomaly / change detection (it was a bit unclear -the topic he was addressing is anomaly detection, although he uses the words "change detection")

I am not very familiar with Hidden Markov Models, but as I understand it, they require a known Markov process (all states and transition probabilities known) and for each state a known set of emission probabilities. The really interesting thing that can be done with these is that given a sequence of emissions one can find the most likely sequence of states that would have led to those emissions.

Assuming that I am understanding HMM correctly (please correct me if I am wrong), how is this used for anomaly detection? How would one determine the underlying Markov process to use and the emission probabilities to use an HMM for anomaly detection?

Best Answer

According to this Wikipedia article, there are many inference benefits you gain from a HMM. The first inference is the ability to assign a probability to any observation sequence $\mathbf{Y} = (Y_1,\ldots, Y_N)$ by marginalizing over the set of all possible hidden state sequences $\mathbf{X} = (X_1,\ldots, X_N)$:

$P(\mathbf{Y}) = \sum_{\mathbf{X}} P(\mathbf{X}) P( \mathbf{Y} \vert \mathbf{X} )$

This way, you can assign probabilities to observation sequences even in an on-line manner as observations arrive (using the very efficient forward algorithm). An anomaly is an observation that is (relatively) highly unlikely according $P(\mathbf{Y})$ (a threshold can be used to decide). Of course, the value of $P(\mathbf{Y})$ grows smaller and smaller as $N$ increases. Many methods can be used to renormalize $P(\mathbf{Y})$ to keep it within the representable range of floating-point data types and enable meaningful thresholding. For example, we might use the following as an anomaly measure:

$\mathbb{A}_{N} = \log P(Y_N \vert Y_1,\ldots, Y_{N-1}) = \log \frac{P(Y_1,\ldots, Y_N)}{P(Y_1,\ldots, Y_{N-1})}$ $\mathbb{A}_{N} = \log P(Y_1,\ldots, Y_N) - \log P(Y_1,\ldots, Y_{N-1})$