Time-Series – Techniques for Detecting Steps in Time Series Data

change pointtime series

I've attached a picture of the time series I'm talking about. The top is the original series, the bottom is the differenced series.

Each data point is a 5 minute average reading from a strain gauge. This strain gauge is placed on a machine. The noisy areas correspond to areas where the machine is turned on, the clean areas are when the machine is turned off. If you look at the area circled in red, there are anomalous steps in the reading that I would like to be able to detect automatically.

I'm completely stumped on how I might be able to do this – any ideas?

enter image description here

Best Answer

It appears you are looking for spikes within intervals of relative quiet. "Relative" means compared to typical nearby values, which suggests smoothing the series. A robust smooth is desirable precisely because it should not be influenced by a few local spikes. "Quiet" means variation around that smooth is small. Again, a robust estimate of local variation is desirable. Finally, a "spike" would be a large residual as a multiple of the local variation.

To implement this recipe, we need to choose (a) how close "nearby" means, (b) a recipe for smoothing, and (c) a recipe for finding local variation. You may have to experiment with (a), so let's make it an easily controllable parameter. Good, readily available choices for (b) and (c) are Lowess and the IQR, respectively. Here is an R implementation:

library(zoo)                      # For the local (moving window) IQR
f <- function(x, width=7) {       # width = size of moving window in time steps
    w <- width / length(x)
    y <- lowess(x, f=w)           # The smooth
    r <- zoo(x - y$y)             # Its residuals, structured for the next step
    z <- rollapply(r, width, IQR) # The running estimate of variability
    r/z                           # The diagnostic series: residuals scaled by IQRs
}

As an example of its use, consider these simulated data where two successive spikes are added to a quiet period (two in a row should be harder to detect than one isolated spike):

> x <- c(rnorm(192, mean=0, sd=1), rnorm(96, mean=0, sd=0.1), rnorm(192, mean=0, sd=1))
> x[240:241] <- c(1,-1) # Add a local spike
> plot(x)

Simulated data

Here is the diagnostic plot:

> u <- f(x)
> plot(u)

Diagnostic plot

Despite all the noise in the original data, this plot beautifully detects the (relatively small) spikes in the center. Automate the detection by scanning f(x) for largish values (larger than about 5 in absolute value: experiment to see what works best with sample data).

> spikes <- u[abs(u) >= 5]
      240       241       273 
 9.274959 -9.586756  6.319956

The spurious detection at time 273 was a random local outlier. You can refine the test to exclude (most) such spurious values by modifying f to look for simultaneously high values of the diagnostic r/z and low values of the running IQR, z. However, although the diagnostic has a universal (unitless) scale and interpretation, the meaning of a "low" IQR depends on the units of the data and has to be determined from experience.