Change-Point – How to Detect Change Points in Data Analysis Using Advanced Techniques

change point

I have a specific question about the formulation of offline multiple change point detection given in Burg and Williams.

Where the change points are denoted $\{\tau_i\}$, and the slice of a time series from $a$ to $b$ is $\textbf{y}_{a:b}$. $\ell$ is a loss function, and $P$ is a penalty on the number of change points.

My question is: why is it written as $\ell(\textbf{y}_{\tau_{i-1}:\tau_{i}-1})$ and not $\ell(\textbf{y}_{\tau_{i-1}:\tau_{i}})$? In other words, why is there a second $-1$ term in the loss function?

Best Answer

So $\textbf{y}_{\tau_{i-1}:\tau_{i}}$ are the elements of the time series going from changepoint number $i-1$ going up to changepoint number $i$.

Then $\textbf{y}_{\tau_{i-1}:\tau_{i}-1}$ (notice the additional -1 is not a subscript of the $\tau$ but rather a subscript of the $\textbf{y}$). So this time series starts at changepoint number $i-1$ and goes up to one element before changepoint number $i$ ($\tau_i-1$).

This is because you want to calculate the loss function from one changepoint to the next with no overlap.

Related Solutions

Solved – sequential change point detection

In a nutshell, change detection is the problem of determining changes in the distribution of a stochastic process when the decision is made as observations arrive.

The relevant Wikipedia article is this.

Solved – Bayesian change point detection

Briefly, the package mcp does Bayesian change point regression. As of v0.2, it takes Gaussian, Binomial, Bernoulli, and Poisson. Modeling your data as four intercept-only segments:

model = list(
  y ~ 1,  # Intercept
  ~ 1,  # etc...
  ~ 1,
  ~ 1
)

library(mcp)
df = data.frame(x = seq_along(coverages), y = coverages)
fit = mcp(model, df, par_x = "x")

Let's plot it with a prediction interval, just for fun (green dashed lines). The blue curves are posterior densities for the change point locations. The gray lines are random draws from the posterior.

plot(fit, q_predict = T)

You can use plot_pars() to plot individual parameter estimates. Here are the summaries. where cp_* are the change point estimates:

summary(fit))

Family: gaussian(link = 'identity')
Iterations: 9000 from 3 chains.
Segments:
  1: y ~ 1
  2: y ~ 1 ~ 1
  3: y ~ 1 ~ 1
  4: y ~ 1 ~ 1

Population-level parameters:
    name    mean  lower    upper Rhat n.eff
    cp_1 101.280  99.38 103.0000    1  5627
    cp_2 199.562 199.00 200.4314    1  5038
    cp_3 299.365 296.85 301.7760    1  2340
   int_1  -0.047  -0.11   0.0104    1  5614
   int_2  -0.620  -0.68  -0.5592    1  5792
   int_3   0.423   0.37   0.4838    1  6463
   int_4  -0.018  -0.04   0.0036    1  5382
 sigma_1   0.295   0.28   0.3082    1  5963

Read more on the mcp website. Disclaimer: I am the developer of mcp.

Best Answer

Related Solutions

Solved – sequential change point detection

Solved – Bayesian change point detection

Related Question