Solved – Maximizing Log-Likelihood Estimation for Changepoint Detection

change pointlikelihoodmaximum likelihood

I'm trying to code the changepoint detection algo described here:

Link to original ppt: http://www.slideshare.net/kuma0177/velocity-ny-2014v5-39160794
Slides: 13-16

Slide 16: http://imgur.com/fF69Q3Q

It is my understanding that the slides use a rolling median as input to calculate lambda on slide 16. However, I'm not entirely sure how you plug the median into that lambda equation. My stats background is pretty weak.

My guess is that this algorithm is not an online one, but a batch one. I think what happens is that you have a time series of medians (x0,y0), (x1,y1), … (xn,yn) then you split this into 2 time series in all possible ways:

change_point_Tau1@(x0,y0), (x1,y1), (x2,y2), ... (xn,yn)
(x0,y0), change_point_Tau1@(x1,y1), (x2,y2), ... (xn,yn)
(x0,y0), (x1,y1), change_point_Tau1@(x2,y2), ... (xn,yn)
...
...
(x0,y0), (x1,y1), (x2,y2), ... change_point_Tau1@(x(n-1),y(n-1)), (xn, yn)
(x0,y0), (x1,y1), (x2,y2), ... (x(n-1),y(n-1)), change_point_Tau1@(xn, yn)

Then for each of these 2 halves you want to calculate some Theta. Since we can't get theta we use an approximation for it, shown as Theta hat. Theta is sth called a probability density function. I'd need this function to be simpler enough so that it can be approximated using a programming language like Scala or Java.

Q: How do you calculate this theta hat? There are 2 of them in equation ML.  

Q: Is my understanding of this algorithm correct?  

Q: How to calculate all those probabilities? 

For example I see P(y(1:Tau1) | Theta_hat) – probability no change points exists at Tau1 given Theta hat. Using Bayes, this becomes:

P(y(1:Tau1) | Theta_hat) = P(Theta_hat | y(1:Tau1))*P(y(1:Tau1)) / P(Theta_hat)

Now where would I get those new probabilities like P(Theta_hat | y(1:Tau1)), P(y(1:Tau1)) and P(Theta_hat)?

I know, lots of questions all crammed in here..

Best Answer

It is not clear from the presentation what distributional assumptions are being made in order to calculate the likelihoods.

It might be simpler for you to look at the recently published BreakoutDetection package published by the same authors: https://blog.twitter.com/2014/breakout-detection-in-the-wild.

But if you are more interested in learning about changepoint detection and how to use likelihoods then read on.

Normally you have a time series y which we assumes has n observations. In order to use likelihoods we need to make some assumptions about the distribution that y comes from. It is often assumed that the data come from Normal distribution (although this isn't always appropriate). Following a distributional assumption you need to decide which parameters of the distribution are allowed to change, e.g. mean, variance, both.

If all the parameters change then you proceed by splitting your data into 2 halves, before change and after change, and use maximum likelihoods to fit the parameters to each half. In this way it is like a normal analysis where you just have data points and you fit a model to that data. If not all parameters can change then you need to estimate those that don't change from the whole data (not always an easy task, especially when parameters are linked).

The trick with changepoint analysis is that you don't know where the change is, so you have to calculate the likelihood for each possible changepoint location and take the most likely as the hypothesized changepoint location. It is this location and likelihood that you then test to see if the change is significant by comparing the likelihood ratio to a threshold to see if the change is significant (comparing to c in slide 16).

Related Question