Python – Using PELT Changepoint Detection for Observation Counts Data

change pointpythonstructural-change

I'm trying to detect changepoints in the number of observations (specifically the number of occurrences of x happening per day). Broadly speaking the events are independent and the time intervals between them are exponentially distributed. However at certain points, such as changes in policy or legislation, there may be a change in the number of occurrences per day.

Implementation will be via a Python application and off-line detection is preferred since analysis will be after the fact.

Practical aspects and review of available literature lead me to prefer to use PELT for this. However, I've not been able to find anything that confirms PELT is ok to use for this. This is for a critical public safety application so it needs to be valid and I'd really appreciate any advice or comment including any tips on setting up the problem.

Best Answer

The justification to use or not use PELT depends on how you will define the cost/loss function. I think that we first need to distinguish those terms.

Within change-point detection framework, a common approach is the cost based approach. The Statistical Part of this approach concerns in setting up a proper cost function and suitable constraints relevant to your problem. The cost is usually additive in the segmented blocks.

Being a bit more precise, if $(y_{i})_{i=1}^T$ is your data and $\tau = \{t_j\}_{j=0}^{k+1}$ is a segmentation of your data where $t_0 = 0$ and $t_{k+1} = T$. If your block cost function is $c$, then the segmentation cost is

$$c(\tau) = \sum_{j=0}^{k} c(y_{(t_j+1):t_{j+1}}) \quad,$$

where $y_{(t_i+1):t_{i+1}}$ is the data block between $t_i + 1$ and $t_{i+1}$. Therefore, the first object that you need to specify and justify properly is this cost function $c$.

After specifying the cost, we need to compute it. When the number of change-points is unknown, computing the solution is not a trivial task since there are $2^T$ possible blocks segmentation if no restriction is made. Efficiently computing the solution requires what we call search methods.

There are three common approaches to search methods: binary segmentation, dynamic programming, and PELT. The first is a greedy (approximated) solution to the problem, and usually has a computational complexity of $O(n)$ or $O(n\log(n))$ in time, hence it is fast for large datasets. The second is an application of the general dynamic programming paradigm, and provides an exact solution at the computational cost of $O(n^2)$ in time and memory, hence quite slow on large datasets.

PELT is an improvement of the dynamic programming approach. Its idea is that, if your cost function satisfies some properties, you can skip some iterations, and this makes the algorithm much faster. Indeed, under some conditions, the time complexity is $O(n)$.

Finally, let's address your question. In my opinion, the part that needs most justification is the choice of cost. PELT is an efficient algorithm to obtain your solution. But an efficient solution to the wrong approach is still useless.

From your description, a first suggestion is to define the cost of a block as the negative log likelihood of a Poisson distribution evaluated at the MLE for the parameter, plus a regularization. A usual regularization is the BIC, so that to each block we add $\beta \log(T)$, where $\beta$ is a hyperparameter that you need to tune. Of course, you need to check if this suggestion is appropriate for your problem. It might be too simple.

Next, you need to choose the search method. If you have a large dataset, you probably want to apply binary segmentation or PELT. With PELT, you need to check if the conditions for PELT apply. If you use negative log likelihoods + regularization as the cost function, the PELT conditions are satisfied, therefore you can apply it.

However, I would not dismiss the approximate solution provided by binary segmentation. A recent benchmark on change-point detection shows that it performs very well, if not equivalent, to the exact solution. See the reference below.

References

Here are arXiv links for the references.

(PELT) https://arxiv.org/abs/1101.1438

(Review on CPD) https://arxiv.org/abs/1801.00718

(Benchmark) https://arxiv.org/abs/2003.06222

Related Question