I have a specific question about the formulation of offline multiple change point detection given in Burg and Williams.
Where the change points are denoted $\{\tau_i\}$, and the slice of a time series from $a$ to $b$ is $\textbf{y}_{a:b}$. $\ell$ is a loss function, and $P$ is a penalty on the number of change points.
My question is: why is it written as $\ell(\textbf{y}_{\tau_{i-1}:\tau_{i}-1})$ and not $\ell(\textbf{y}_{\tau_{i-1}:\tau_{i}})$? In other words, why is there a second $-1$ term in the loss function?
Best Answer
So $\textbf{y}_{\tau_{i-1}:\tau_{i}}$ are the elements of the time series going from changepoint number $i-1$ going up to changepoint number $i$.
Then $\textbf{y}_{\tau_{i-1}:\tau_{i}-1}$ (notice the additional -1 is not a subscript of the $\tau$ but rather a subscript of the $\textbf{y}$). So this time series starts at changepoint number $i-1$ and goes up to one element before changepoint number $i$ ($\tau_i-1$).
This is because you want to calculate the loss function from one changepoint to the next with no overlap.