Time Series Analysis – Choosing the Best Changepoint Detection Algorithm

change pointdynamic programmingmaximum likelihoodtime seriestimeseries-segmentation

I've been reading up on changepoint algorithms (dynamic programming, Bayesian Online Changepoint detection, Hidden Markov Models, etc.) and am looking to implement an algorithm that has a certain set of attributes, ideally in Python. I don't have a formal stats/math background, so it can be a bit tedious to read through the material and many of the algorithms have limitations, which are sometimes discovered halfway through a paper or blog post.

What I'm looking for:

  1. Fast/online. If an algorithm is sufficiently fast, it's probably less important that it's online. My understanding is that some offline algorithms can be adapted to be online by refining the last "known" changepoint until another is found.
  2. Allows for multiple changepoints, where the total number of changepoints is unknown before running the algorithm
  3. Does not require specifying an exact number of classes for the segments resulting from the changepoints
  4. Allows for multivariate data where the variables are evaluated jointly. Perhaps an alternative would be to perform multiple single variate analyses and attempt to merge changepoints that occur in different variables but near in time. Just a thought since I haven't specifically come across this, but a true multivariate solution would be ideal.
  5. Accounts for changes in standard deviation and not just mean. This could perhaps be replicated by calculating the absolute distance from the latest mean in real time and passing that in as an additional variable. In my specific case, I expect the individual time steps and segment means to be +/- from 0, so simply using absolute value might make it easier and avoid the need for additional recursion.

And obviously something that's as exact as possible would be ideal.

Any suggestions?

Best Answer

The R package bcp seem to fulfill all of these (associated paper here). It returns the probability of change point at each index in your data, so you have to set a threshold yourself. This is a nice feature compared to many other packages.

For multivariate change point detection, it requires that the data is in a matrix format, i.e., that all outcome variables are observed simultaneously (or close enough that it's OK to pretend).

It does not model autocorrelation which is often an important feature of time series.

I have tried to make an overview of change point packages in R. Few match your problem but take a look at the list of not-yet-reviewed packages at the top of the page.

Related Question