Trend Analysis in Time Series – How to Analyze Trends in Non-Periodic Time Series Using R

rtime series

Suppose I have following non-periodic time series. Obviously the trend is decreasing and I would like to prove it by some test (with p-value). I am unable to use classic linear regression due to strong temporal (serial) auto-correlation among values.

library(forecast)
my.ts <- ts(c(10,11,11.5,10,10.1,9,11,10,8,9,9,
               6,5,5,4,3,3,2,1,2,4,4,2,1,1,0.5,1),
            start = 1, end = 27,frequency = 1)
plot(my.ts, col = "black", type = "p",
     pch = 20, cex = 1.2, ylim = c(0,13))
# line of moving averages 
lines(ma(my.ts,3),col="red", lty = 2, lwd = 2)

What are my options?

Best Answer

As you said, the trend in your example data is obvious. If you want just to justify this fact by hypothesis test, than besides using linear regression (the obvious parametric choice), you can use non-parametric Mann-Kendall test for monotonic trend. The test is used to

assess if there is a monotonic upward or downward trend of the variable of interest over time. A monotonic upward (downward) trend means that the variable consistently increases (decreases) through time, but the trend may or may not be linear. (http://vsp.pnnl.gov/help/Vsample/Design_Trend_Mann_Kendall.htm)

moreover, as noted by Gilbert (1987), the test

is particularly useful since missing values are allowed and the data need not conform to any particular distribution

The test statistic is the difference between negative and positive $x_j-x_i$ differences among all the $n(n-1)/2$ possible pairs, i.e.

$$ S = \displaystyle\sum_{i=1}^{n-1}\displaystyle\sum_{j=i+1}^{n}\mathrm{sgn}(x_j-x_i) $$

where $\mathrm{sgn}(\cdot)$ is a sign function. $S$ can be used to calculate $\tau$ statistics that is similar to correlation as it ranges from $-1$ to $+1$, where the sign suggests negative, or positive trend and value of $\tau$ is proportional to slope of the trend.

$$ \tau = \frac{S}{n(n-1)/2} $$

Finally, you can compute $p$-values. For samples of size $n \le 10$ you can use tables of precomputed $p$-values for different values of $S$ and different sample sizes (see Gilbert, 1987). With larger samples, first you need to compute variance of $S$

$$ \mathrm{var}(S) = \frac{1}{18}\Big[n(n-1)(2n+5) - \displaystyle\sum_{p=1}^{g}t_p(t_p-1)(2t_p+5)\Big] $$

and then compute $Z_{MK}$ test statistic

$$ Z_{MK} = \begin{cases} \frac{S-1}{\mathrm{var}(S)} & \text{if} ~ S > 0 \\ 0 & \text{if} ~ S = 0 \\ \frac{S+1}{\mathrm{var}(S)} & \text{if} ~ S < 0 \end{cases} $$

the value of $Z_{MK}$ is compared to standard normal values

$Z_{MK} \ge Z_{1-\alpha}$ for upward trend,
$Z_{MK} \le -Z_{1-\alpha}$ for downward trend,
$|Z_{MK}| \ge Z_{1-\alpha/2}$ for upward or downward trend.

In this thread you can find R code implementing this test.

Since the $S$ statistic is compared to all possible pairs of observations then, instead of using normal approximation for $p$-value you can use permutation test that is obvious for this case. First, you compute $S$ statistic from your data and then you randomly shuffle your data multiple times and compute it for each of the samples. $p$ is simply the proportion of cases when $S_\text{data} \ge S_\text{permutation}$ for upward trend or $S_\text{data} \le S_\text{permutation}$ for downward trend.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Wiley, NY.

Önöz, B., & Bayazit, M. (2003). The power of statistical tests for trend detection. Turkish Journal of Engineering and Environmental Sciences, 27(4), 247-251.

Related Solutions

Time Series Analysis – How to Detect Trends

If you use lm then you should check the residuals to see if they are autocorrelated or not. I guess they are not uncorrelated and hence your t-test are not valid (this is true also for the case of summary(lm(y~t+I(t^2)). This is basiacally beacuse there is a time variable involved in your lm.

I recommend to use Generalized Least Square approach in order to test the quadratic effect and take into account the autocorrelated problem. For example if you assume the autoregressive of order two (see below) for the residuals of your lm (i.e. $e_t=\phi_1 e_{t-1}+\phi_2 e_{t-2}+\nu_t$, where $\nu_t$ is white noise), then the code would be like

library(nlme)
m1=gls(y~t+I(t^2),correlation=corARMA(p=2))
summary(m1)

Note: You should model the error terms correctly first (i.e. finding the order of $p$ and $q$) maybe by ckecking the ACF or PACF of the residuals in your lm. In above, I assumed AR(2). More complicated ARMA model can be considered and tested.

Solved – How to estimate the delay between two non-periodic time series

The most straightforward approach is probably cross-correlation. You may be interested in the answers to this post and the posts linked in the right-hand-side of that post (linked also in my comment to one of the answers).

Another approach is by means of a dynamic regression. You could fit an autoregressive distributed lag model. For a description of this model and some references see for example my answer to this post.

This will allow you to test for the significance of one variable to explain the other and also for the significance of lags of the explanatory variable. The order of the highest significant lag will be a measure of the delay between both variables.

For the kind of data that you mention (economic indicators), you will probably need to fit the model for the differences or growth rates of the variables (not for the original series) in order for them to be stationary and avoid possible spurious results due to stochastic trends in the variables.

A more involved approach would be as follows:

Extract a cyclical signal from each series, for example by means of the Hodrick and Prescott filter or model based approaches (ARIMA, structural time series models).
Identify the peaks and troughs in the estimated cyclical signals. These points can generally be easily identified from the graph and can be detected manually, but if you have many time series it is relatively easy to devise an algorithm to do that.
Inspect the dates of the peaks and troughs in each series and compare them. Descriptive statistics of the difference between each peak or trough as well as an histogram of this differences can be helpful to determine whether a given series systematically reaches a peak some periods in advance or with delay compared to other series.

A nice way to look at the estimated cycles is the animated plot used by Statistics Netherlands to track the business cycle. This plot does not reveal exactly by how many periods an economic indicator is ahead or behind other indicators, but it's a nice way to summarize the results.

Best Answer

Related Solutions

Time Series Analysis – How to Detect Trends

Solved – How to estimate the delay between two non-periodic time series

Related Question