Solved – Detecting structural breaks in a dataset.

rstructural-changetime series

I have some trouble detecting structural breaks in a dataset.

The dataset has multiplied time-series, and it follows a trend. I want to find a structural break when the time-series breaks the trend. The structural breaks occurs when the value randomly goes from i.e. 2000 and drops to zero, or close to zero. Others can drop down to 200, and then continue the trend. And in some cases the value can randomly go up and down, in this case the model should not interpret it as structural break.

I have tried to figure this out with studentized residual, the thought was that when someone broke the trend and churned it would give a value close to -2. But as you can see from my code, the time-series which I want to recognize as a structural break (y) only has a studentized residual value of -0.43 at the periode where it drops to zero.

I also tried with bfast without luck.

Here are some example data:

# Trending
x <- c(12:1)
#Structural break
y <- c(20,19,0,0,0,0,0,0,0,0,0,0) 
#Trending
z <- c(0,0,20,19,18,17,17,17,17,17,17,17)
#Random drop and trending trending
w <- c(20:16,10:4)

The output would be:

c(x=FALSE, y=TRUE, z=FALSE, w=FALSE)

This is the code I used to find structural break:

library(bfast)
#Time-series 
x.ts <- ts(x, frequency = 12)
m1 <- lm(x~1, data=x.ts)
#Studentized plot and values
plot(rstudent(m1))
rstudent(m1)
#bfast for detecting structural breaks
bfast01(x.ts, x~1, test=c("BIC", "OLS-MOSUM", "supLM"), aggregate=any, bandwidth = 0.15)

Best Answer

In principle, the methods in the strucchange package (upon which bfast builds) can be used to detect the patterns you have created here. However, the sample size with 12 observations is very challenging for asymptotic methods as those in strucchange. As long as you only want to detect shifts in a piecewise constant mean (patterns y and z), you are better off with using permutation tests as those implemented in maxstat_test() from the coin package. If you really want to start out from a linear regression model (patterns x and w) you probably need more observations (or very small errors) to have a fair chance of detecting anything.

For illustration let's set up a data.frame called d which contains a time variable (which is more convenient for the coin package) and a ts series s (which is more convenient for the strucchange package). The latter is also used for visualizing the patterns

d <- data.frame(y, x, z, w, time = 1:12)
s <- ts(cbind(y, x, z, w))
plot(s, type = "b")

For the two patterns that are essentially shifts in a piecewise (almost) constant mean, we can use maxstat_test() from coin relatively easily. The p-values are found by approximation (i.e., simulating a finite number of permutations):

library("coin")
set.seed(1)
maxstat_test(y ~ time, data = d, dist = "approx")
##  Approximative Generalized Maximally Selected Statistics
## 
## data:  y by time
## maxT = 3.3153, p-value = 0.034
## alternative hypothesis: two.sided
## sample estimates:
##   "best" cutpoint: <= 2
maxstat_test(z ~ time, data = d, dist = "approx")
##  Approximative Generalized Maximally Selected Statistics
## 
## data:  z by time
## maxT = 3.2837, p-value = 0.0299
## alternative hypothesis: two.sided
## sample estimates:
##   "best" cutpoint: <= 2

In both cases the "true" breakpoint is found with a significant p-value (at 5% level). One could also tweak the Fstats() function from strucchange to reveal the same breakpoint; but the asymptotic inference that Fstats() is based on is surely less appropriate than the permutation inference. For a more detailed discussion of the differences between these tests see: Achim Zeileis, Torsten Hothorn (2013). "A Toolbox of Permutation Tests for Structural Change." Statistical Papers, 54(4), 931–954. doi:10.1007/s00362-013-0503-4

For the pattern w where a piecewise linear regression is needed, one could use Fstats() and breakpoints() but estimating two intercepts and two slopes from just 12 observations is really pushing it, especially if the breakpoint between the two segments also should be determined. In this artificial setup without any noise, it can be done but I wouldn't recommend it in practical situations with more noise.

library("strucchange")
coef(breakpoints(w ~ time(s), data = s, h = 4))
##        (Intercept) time(s)
## 1 - 5           21      -1
## 6 - 12          16      -1

Thus, the true segments/parameters are recovered here. But admittedly with a minimal segment size of h = 4 and no noise there is not much to estimate.

Note also that the ~ time part in the formula means different things for maxstat_test() and breakpoints(). In the latter case, it means that a (piecewise) linear time trend is used. In the former case, it means that a split in a (piecewise) constant mean is searched in the ordering by the time variable. (In the latter case, the ordering is implicit through using a ts time series data.)

Related Solutions

Solved – Structural breaks, stationarity and time series modelling

You can estimate this using the strucchange R package with a simple linear regression of y given x. In your case the slope coefficient equals $b$ before the break, and $b + b_{break}$ after the break.

Using the breakpoints() fct in strucchange this will be something like

bp.mod <- breakpoints(y ~ x, breaks = 1)  # specify 1, also automated possible
summary(bp.mod)
plot(bp.mod)

See also the very useful strucchange vignette.

Solved – Detecting trend reversal in a time series

With realistically noisy signals, this is not a simple problem. For example, you write

a heuristic rule would say that a trend reversal has occurred if the value has been trending in a direction for $t1$ seconds, then in the other direction for $t2 = t1/k$ seconds.

What if the signal would rise for $t1$ seconds by a total of $5\%$, then fall for $t1 / k$ seconds by a total of $6\%$, except for some time point in the second part where it rises by a negligible $0.0000001\%$?

A refinement on your idea would be to spot some difference between some types of moving averages between two time intervals. The moving averages would (hopefully) smooth out these problems, and the difference would be more indicative of a trend change.

In econometric technical analysis, one implementation of this is through the MACD (moving average convergence divergence).

The idea is

take a "slow" exponential moving average (in the original context, 26 days)
take a "fast" exponential moving average (in the original context, 12 days)
subtract the former from the latter
take an exponential moving average of 3. (in the original context, 9 days)

When the difference between 3. and 4. crosses 0, it's an indication that the trend reverse.

You can see an example in the following diagram (taken from the Wikipedia entry).

In the top diagram, the red/green lines indicate the market price. The purple and black lines indicate the fast and slow moving averages.

In the bottom diagram, the blue line is 3., and the red line is 4. The difference between them is the grey bar chart.

You can see that a zero crossing of the grey bar chart, roughly corresponds to a trend change in the original green/red prices.

Best Answer

Related Solutions

Solved – Structural breaks, stationarity and time series modelling

Solved – Detecting trend reversal in a time series

Related Question