Time-Series – Techniques for Detecting Steps in Time Series Data

change pointtime series

I've attached a picture of the time series I'm talking about. The top is the original series, the bottom is the differenced series.

Each data point is a 5 minute average reading from a strain gauge. This strain gauge is placed on a machine. The noisy areas correspond to areas where the machine is turned on, the clean areas are when the machine is turned off. If you look at the area circled in red, there are anomalous steps in the reading that I would like to be able to detect automatically.

I'm completely stumped on how I might be able to do this – any ideas?

enter image description here

Best Answer

It appears you are looking for spikes within intervals of relative quiet. "Relative" means compared to typical nearby values, which suggests smoothing the series. A robust smooth is desirable precisely because it should not be influenced by a few local spikes. "Quiet" means variation around that smooth is small. Again, a robust estimate of local variation is desirable. Finally, a "spike" would be a large residual as a multiple of the local variation.

To implement this recipe, we need to choose (a) how close "nearby" means, (b) a recipe for smoothing, and (c) a recipe for finding local variation. You may have to experiment with (a), so let's make it an easily controllable parameter. Good, readily available choices for (b) and (c) are Lowess and the IQR, respectively. Here is an R implementation:

library(zoo)                      # For the local (moving window) IQR
f <- function(x, width=7) {       # width = size of moving window in time steps
    w <- width / length(x)
    y <- lowess(x, f=w)           # The smooth
    r <- zoo(x - y$y)             # Its residuals, structured for the next step
    z <- rollapply(r, width, IQR) # The running estimate of variability
    r/z                           # The diagnostic series: residuals scaled by IQRs
}

As an example of its use, consider these simulated data where two successive spikes are added to a quiet period (two in a row should be harder to detect than one isolated spike):

> x <- c(rnorm(192, mean=0, sd=1), rnorm(96, mean=0, sd=0.1), rnorm(192, mean=0, sd=1))
> x[240:241] <- c(1,-1) # Add a local spike
> plot(x)

Simulated data

Here is the diagnostic plot:

> u <- f(x)
> plot(u)

Diagnostic plot

Despite all the noise in the original data, this plot beautifully detects the (relatively small) spikes in the center. Automate the detection by scanning f(x) for largish values (larger than about 5 in absolute value: experiment to see what works best with sample data).

> spikes <- u[abs(u) >= 5]
      240       241       273 
 9.274959 -9.586756  6.319956

The spurious detection at time 273 was a random local outlier. You can refine the test to exclude (most) such spurious values by modifying f to look for simultaneously high values of the diagnostic r/z and low values of the running IQR, z. However, although the diagnostic has a universal (unitless) scale and interpretation, the meaning of a "low" IQR depends on the units of the data and has to be determined from experience.

Related Solutions

Solved – Detecting time-shifted time series

You could start by looking at Cross-Correlation between the time-series.

Here is how to do it in Python: https://stackoverflow.com/questions/6991471/computing-cross-correlation-function

Time Series Anomalies – Detecting Changes in R

You could use time series outlier detection to detect changes in time series. Tsay's or Chen and Liu's procedures are popular time series outlier detection methods . See my earlier question on this site.

R's tsoutlier package uses Chen and Liu's method for detection outliers. SAS/SPSS/Autobox can also do this. See below for the R code to detect changes in time series.

library("tsoutliers")
dat.ts<- ts(dat.change,frequency=1)
data.ts.outliers <- tso(dat.ts)
data.ts.outliers
plot(data.ts.outliers)

tso function in tsoultlier package identifies following outliers. You can read documentation to find out the type of outliers.

Outliers:
  type ind time coefhat   tstat
1   TC  42   42 -2.9462 -10.068
2   AO  43   43  1.0733   4.322
3   AO  45   45 -1.2113  -4.849
4   TC  47   47  1.0143   3.387
5   AO  51   51  0.9002   3.433
6   AO  52   52 -1.3455  -5.165
7   AO  56   56  0.9074   3.710
8   LS  62   62  1.1284   3.717
9   AO  67   67 -1.3503  -5.502

the package also provides nice plots. see below. The plot shows where the outliers are and also what would have happened if there were no outliers.

enter image description here

I have also used R package called strucchange to detect level shifts. As an example on your data

library("strucchange")
breakpoints(dat.ts~1)

The program correctly identifies breakpoints or structural changes.

Optimal 4-segment partition: 

Call:
breakpoints.formula(formula = dat.ts ~ 1)

Breakpoints at observation number:
17 41 87 

Corresponding to breakdates:
17 41 87

Hope this helps

Best Answer

Related Solutions

Solved – Detecting time-shifted time series

Time Series Anomalies – Detecting Changes in R

Related Question