Solved – Bayesian change point detection

bayesianchange point

Really naive question. I have a time series. I know how to perform segmentation (like binary segmentation algorithm). The goal is to find intervals generated from different probabilistic models.

But I have all information about possible models (distribution shape, variance, mean). So for each time point I have likelihood of each model and its prior. => I can calculate posterior for each time point, any model and any interval.

Problem: if I just segment the time series using maximum posterior probability, I will have too many change points. HMM can be a solution, but it also takes only one point into account and does not "look" at the whole interval. Also it is difficult to apply HMM for non-normal data.

It can be solved with sliding window, but it is not clear how to choose size of sliding window.

Is there an algorithm for this type of Bayesian change point detection (when you know possible models)? Like HMM, but takes the interval into account and can work with any parametric distribution? Heuristic algorithm is good too.

How can I apply maximum likelihood clustering for this problem?

UPD:
Simulation of the problem:

variances <- runif(1000,0.01,0.5)

coverages <- c()

for (i in seq(1:100)) {
  coverages <- c(coverages, rnorm(1, mean=0, sd=variances[i]))
}

for (i in seq(101:200)) {
  coverages <- c(coverages, rnorm(1, mean=-log(2), sd=variances[i] / 0.75))
}

for (i in seq(201:300)) {
  coverages <- c(coverages, rnorm(1, mean=log(3/2), sd=variances[i] * 0.75))
}

for (i in seq(301:1000)) {
  coverages <- c(coverages, rnorm(1, mean=0, sd=variances[i]))
}

plot(coverages)

enter image description here

In real life, I know possible variances and means for each time point. I need to infer prevalence of one of the models inside the segment.

Best Answer

Briefly, the package mcp does Bayesian change point regression. As of v0.2, it takes Gaussian, Binomial, Bernoulli, and Poisson. Modeling your data as four intercept-only segments:

model = list(
  y ~ 1,  # Intercept
  ~ 1,  # etc...
  ~ 1,
  ~ 1
)

library(mcp)
df = data.frame(x = seq_along(coverages), y = coverages)
fit = mcp(model, df, par_x = "x")

Let's plot it with a prediction interval, just for fun (green dashed lines). The blue curves are posterior densities for the change point locations. The gray lines are random draws from the posterior.

plot(fit, q_predict = T)

enter image description here

You can use plot_pars() to plot individual parameter estimates. Here are the summaries. where cp_* are the change point estimates:

summary(fit))

Family: gaussian(link = 'identity')
Iterations: 9000 from 3 chains.
Segments:
  1: y ~ 1
  2: y ~ 1 ~ 1
  3: y ~ 1 ~ 1
  4: y ~ 1 ~ 1

Population-level parameters:
    name    mean  lower    upper Rhat n.eff
    cp_1 101.280  99.38 103.0000    1  5627
    cp_2 199.562 199.00 200.4314    1  5038
    cp_3 299.365 296.85 301.7760    1  2340
   int_1  -0.047  -0.11   0.0104    1  5614
   int_2  -0.620  -0.68  -0.5592    1  5792
   int_3   0.423   0.37   0.4838    1  6463
   int_4  -0.018  -0.04   0.0036    1  5382
 sigma_1   0.295   0.28   0.3082    1  5963

Read more on the mcp website. Disclaimer: I am the developer of mcp.

Related Question