Solved – Bayesian change point detection

Really naive question. I have a time series. I know how to perform segmentation (like binary segmentation algorithm). The goal is to find intervals generated from different probabilistic models.

But I have all information about possible models (distribution shape, variance, mean). So for each time point I have likelihood of each model and its prior. => I can calculate posterior for each time point, any model and any interval.

Problem: if I just segment the time series using maximum posterior probability, I will have too many change points. HMM can be a solution, but it also takes only one point into account and does not "look" at the whole interval. Also it is difficult to apply HMM for non-normal data.

It can be solved with sliding window, but it is not clear how to choose size of sliding window.

Is there an algorithm for this type of Bayesian change point detection (when you know possible models)? Like HMM, but takes the interval into account and can work with any parametric distribution? Heuristic algorithm is good too.

How can I apply maximum likelihood clustering for this problem?

UPD:
Simulation of the problem:

variances <- runif(1000,0.01,0.5)

coverages <- c()

for (i in seq(1:100)) {
  coverages <- c(coverages, rnorm(1, mean=0, sd=variances[i]))
}

for (i in seq(101:200)) {
  coverages <- c(coverages, rnorm(1, mean=-log(2), sd=variances[i] / 0.75))
}

for (i in seq(201:300)) {
  coverages <- c(coverages, rnorm(1, mean=log(3/2), sd=variances[i] * 0.75))
}

for (i in seq(301:1000)) {
  coverages <- c(coverages, rnorm(1, mean=0, sd=variances[i]))
}

plot(coverages)

In real life, I know possible variances and means for each time point. I need to infer prevalence of one of the models inside the segment.

summary(fit)) Family: gaussian(link = 'identity') Iterations: 9000 from 3 chains. Segments: 1: y ~ 1 2: y ~ 1 ~ 1 3: y ~ 1 ~ 1 4: y ~ 1 ~ 1 Population-level parameters: name mean lower upper Rhat n.eff cp_1 101.280 99.38 103.0000 1 5627 cp_2 199.562 199.00 200.4314 1 5038 cp_3 299.365 296.85 301.7760 1 2340 int_1 -0.047 -0.11 0.0104 1 5614 int_2 -0.620 -0.68 -0.5592 1 5792 int_3 0.423 0.37 0.4838 1 6463 int_4 -0.018 -0.04 0.0036 1 5382 sigma_1 0.295 0.28 0.3082 1 5963

Best Answer

Briefly, the package mcp does Bayesian change point regression. As of v0.2, it takes Gaussian, Binomial, Bernoulli, and Poisson. Modeling your data as four intercept-only segments:

model = list(
  y ~ 1,  # Intercept
  ~ 1,  # etc...
  ~ 1,
  ~ 1
)

library(mcp)
df = data.frame(x = seq_along(coverages), y = coverages)
fit = mcp(model, df, par_x = "x")

Let's plot it with a prediction interval, just for fun (green dashed lines). The blue curves are posterior densities for the change point locations. The gray lines are random draws from the posterior.

plot(fit, q_predict = T)

You can use plot_pars() to plot individual parameter estimates. Here are the summaries. where cp_* are the change point estimates:

Read more on the mcp website. Disclaimer: I am the developer of mcp.

Best Answer

Related Solutions

Solved – Choice of a model for Bayesian Change Point Detection

Time Series Analysis – Choosing the Best Changepoint Detection Algorithm

Related Question