Solved – Decompose a time series data into deterministic trend and stochastic trend

arimatest-for-trendtime seriestrend

Given the comments below, I reformed my questions with more background information and simulation/examples. My question also includes the validity of the method in theory. Please correct me if any statement is not accurate.

Background: 1) For our client (policy makers), it is often interesting for them to check whether there is a linear trend in the time series data, especially after a new policy is applied. 2) One characteristic of such data is that it is often with very short time period (around 10 years). 3) Another characteristic of the data is that the observation is not census but estimation from samples (biology and field ecology study, not possible to collect census data). This leads to observation errors involved in the data, sometimes can be very large (looks like outlier).

Motivation: I have been puzzled for some time about how to handle the above three questions. I would like to try as much I can using ARIMA. To solve 1), providing a statistical test for a linear trend.

Method: Theoretically, if the data is stationary after ARIMA(p,d,q) process and it contains a drift, the data can be written as $y_{t}=u\times t + y_{t}^{'}$ and $y_{t}^{'}$ follows a ARIMA process with order(p,d,q). In an example of order (1,1,0), $y_{t}^{'}-y_{t-1}^{'}=\phi(y_{t}^{'}-y_{t-1}^{'})+\epsilon_{t}$. I am thinking to decompose the observed data into a deterministic trend $u\times t $ , a stochastic trend $y_{t}-u\times t -\epsilon_{t}=y_{t}^{'}-\epsilon_{t}$ and error $\epsilon_{t}$. Do you think these two components can be called in these terms?
The parameters of the two trends can be estimated using Arima function.

here is the code on a simulated data to demonstrate how I work through it:

set.seed(123)
n  <- 100
e <- rnorm(n,0,1.345)
y1 <- 3.4
AR <- -0.77 
u  <- 0.05
## 1. simulate ARIMA component
ts.sim1 <-arima.sim(n=n,model=list(ar=AR,order=c(1,1,0)),start.innov=y1/(AR),n.start=1,innov=c(0,rnorm(n-1,0,0.345)))
ts.sim1 <- ts(ts.sim1[2:(n+1)])  
ts.sim1
plot(ts.sim1)
## 2. add linear trend
ts.sim2 <- ts.sim1 + u*(1:(n))
plot(ts.sim2)

This is extra bit of code I used to test whether the parameters I inputed gives stationary data of ts.sim1 after (1,1,0) process.

dat <- replicate(1000, arima.sim(n=n,model=list(ar=AR,order=c(1,1,0)),start.innov=y1/(AR),n.start=1,innov=c(0,rnorm(n-1,0,0.345))))
res  <-  apply(dat, 2, function(x) {fitt <- Arima(x, order=c(1,1,0),  include.drift=F, method="ML"); residuals(fitt)})
p <- apply(res, 2, function(x) adf.test(x)$p.value)
    sum(pv.st > .05)/1000*100

## 3. make some plots
adf.test(ts.sim2, alternative = "stationary")
Acf(ts.sim2, main='')
Pacf(ts.sim2, main='')

## 4. auto-select best model in terms of AIC, and check residual pattern
fit<-auto.arima(ts.sim2, seasonal=FALSE, trace=TRUE, allowdrift=TRUE)
arima.string1(fit)
tsdisplay(residuals(fit), lag.max=15, main='Best Model Residuals')
AIC(fit)

## 5. Apply a drift version of the best (p,d,q) model, even if the best model does not contain drift, and check residuals
fit1  <-  Arima(ts.sim2, order=c(1,1,0),  include.drift=T, method="ML")
summary(fit1)
AIC(fit);AIC(fit1)  
tsdisplay(residuals(fit1), lag.max=15, main='best Model Residuals')

Note that the auto.arima selection do not necessarily gives the true model, in this case, it suggests that the best model is (2,1,0) with drift rather than (1,1,0).

The residual plots (below) for (1,1,0) with drift shows independent random residuals with equal variance

The ARIMA (1,1,0) with drift model output the following:

Series: ts.sim2 
ARIMA(1,1,0) with drift         

Coefficients:
          ar1   drift
      -0.9126  0.0298
s.e.   0.0533  0.0192

sigma^2 estimated as 0.1355:  log likelihood=-41.41
AIC=88.83   AICc=89.08   BIC=96.61

Training set error measures:
                       ME      RMSE       MAE       MPE     MAPE      MASE       ACF1
Training set -0.009091779 0.3625225 0.2671725 -3.364276 12.46496 0.5205074 -0.0499297

I can then test for the statistical significance of the linear slope

    ## 6. test for linear slope
    drift_index <- 2    
    n <- length(discards)
    pvalue <- 2*pt(-abs(fit1$coef[drift_index]/(sqrt(diag(fit1$var.coef))[drift_index]/sqrt(n))), df=n-1)

Finally, plot the fitted deterministic and stochastic trend:

drift_index <- 2
par(mfrow=c(3,1))
t_s <- 1:n
plot(t_s, ts.sim2, type="o", lwd=2, col="red", pch=15, xlab="Year", ylab="", cex.lab=1.5, cex.axis=1.2)

# 1. deterministic trend
ttime <- 1:length(ts.sim2)
y1    <- ttime*fit1$coef[drift_index]
se_re <- sqrt(fit1$sigma2)
m1    <- mean(y1)
offset <- (range(discards)[2]-range(discards)[1])/2 -m1
y1_low <- y1-1.96*se_re+offset
y1_high <- y1+1.96*se_re+offset
plot(t_s,y1+offset, type="n",  ylim=range(y1_low,y1_high), xlab="Year", ylab="Drift", cex.lab=1.5, cex.axis=1.2, main="Deterministic trend", main.cex=1.2)
polygon(c(t_s,rev(t_s)), c(y1_high,rev(y1_low)), 
        col=rgb(0,0,0.6,0.2), border=FALSE)
lines(t_s, y1+offset)

# 2. stochastic trend: fitted value of the ARIMA model part with 95% 
prediction intervals
y2 <- ts.sim2-y1
fitted1 <- y2-residuals(fit1)
se_re <- sqrt(fit1$sigma2)
y2_low <- y2-1.96*se_re
y2_high <- y2+1.96*se_re
plot(t_s, y2, type="n", ylim=range(y2_low,y2_high), xlab="Year", 
ylab="Fitted ARIMA", cex.lab=1.5, cex.axis=1.2, main="Stochastic trend", 
main.cex=1.2)
polygon(c(t_s,rev(t_s)), c(y2_high,rev(y2_low)), 
    col=rgb(0,0,0.6,0.2), border=FALSE)
lines(t_s, y2)

I tried to plot the prediction intervals (conditional to each component) in gray band in the figure, but I am not sure I did the correct calculation in the code.

Additionally, I tried a simulation to see whether this decomposition process is unbiased (codes below). Seems that I need a very long observed time-series (n=1000) to obtain an unbiased estimate of the parameters.

##----simulate N times
N <- 2000
model_type <- rep(NA, N)
res <- data.frame(ar=rep(NA, N), drift=rep(NA,N))
for (k in 1:N) {
  n  <- 1000
  e <- rnorm(n,0,1.345)
  y1 <- 3.4
  AR <- -0.77 
  u  <- 0.05
  ts.sim1 <- arima.sim(n=n,model=list(ar=AR,order=c(1,1,0)),start.innov=y1/(AR),n.start=1,innov=c(0,rnorm(n-1,0,0.345)))
  ts.sim1 <- ts(ts.sim1[2:(n+1)])
  ts.sim2 <- ts.sim1 + u*(1:(n))
  fit<-auto.arima(ts.sim2, seasonal=FALSE, trace=F,allowdrift=TRUE)
  model_type[k] <- arima.string1(fit)
  fit1  <-  Arima(ts.sim2, order=c(1,1,0),  include.drift=T, method="ML")
  res$ar[k] <- coef(fit1)[1]
  res$drift[k] <- coef(fit1)[2]
}
mean(res$ar)
mean(res$drift)
table(model_type)

My question is:

1) are the terminologies deterministic vs. stochastic trend being correctly used here?

2) Theoretically, is this a valid process to detect linear trend and also allow auto-correlated observation/error? Is there any other method to handle this under ARIMA?

3) In my last simulation, I noticed that an unbiased decomposition only works when the time-series is very long, which is the opposite in my data (around 10 years). I guess this is the general problem of ARIMA method.

Best Answer

Before I receive your data I would like to take the "bully pulpit" and expound on the task at hand and how I would go about solving this riddle. Your suggested approach I believe is to form an ARIMA model using procedures which implicitly specify no time trend variables thus incorrectly concluding about required differencing etc.. You assume no outliers, pulses/seasonal pulses and no level shifts(intercept changes). After probable mis-specification of the ARIMA filter/structure you then assume 1 trend/1 intercept and piece it together. This is an approach which although programmable is fraught with logical flaws never mind non-constant error variance or non-constant parameters over time.

The first step in analysis is to list the possible sample space that should be investigated and in the absence of direct solution conduct a computer based solution (trial and error) which uses a myriad of possible trials/combinations yielding a possible suggested optimal solution.

The sample space contains

the number of distinct trends

2 the number of possible intercepts

3 the number and kind of differencing operators

the form of the ARMA model

5 the number of one-time pulses

6 the number of seasonal pulses ( seasonal factors )

7 any required error variance change points suggesting the need for weighted Least Squares

8 any required power transformation reflecting a linkage between the error variance and the expected value

Simply evaluate all possible permutations of these 8 factors and select that unique combination that minimizes some error measurement because ORDER IS IMPORTANT ! .

If this is onerous , so be it and I look forward to receiving your tsim2 so I can (possibly) demonstrate an approach that speaks to this "thorny issue" using some of my favorite toys.

Note that if you simulated (tightly) then your approach might be the answer but the question that I have is "your approach robust to data violations" or is simply a cook-book approach that works on this data set and fails on others. Trust but Verify !

EDITED AFTER RECEIPT OF DATA (100 VALUES)

I trust that this discussion will highlight the need for comprehensive/programmable approaches to forming useful models. As discussed above an efficient computer based tournament looking at possible different combinations (max of possible 256 ) yielded the following suggest initial model approach .

The concept here is to "duplicate/approximate the human eye" by examining competing alternatives which is what (in my opinion) we do when performing visual identification of structure. Note this case most eyeballs will not see the level shift at period 65 and simply focus on the major break in trend around period 51.

1 IDENTIFY DETERMINISTIC BREAK POINTS IN TREND 2 IDENTIFY INTERCEPT CHANGES 2 EVALUATE NEED FOR ARIMA AUGMENTATION 4 EVALUATE NEED FOR PULSES

SIMPLIFY VIA NECESSITY TESTS

detailing both a trend change (51) and an intercept change (65). Model diagnostic checking (always a good idea in iterative approaches to model form) yielded the following acf suggesting that improvement was necessary to render a set of residuals free of structure. An augmented model was then suggested of the form with an insignificant AR(1) coefficient.

The final model is here with model statistics and here

The residuals from this model are presented here with an acf of

The Actual/Fit and Forecast graph is here . The cleansed vs the actual is revealing as it details the level shift effect

In summary where the OP simulated a (1,1,0) for the fitst 50 observations, he then abridged the last 50 observations effectively coloring/changing the composite ARMA process to a (1,0,0) while embodying the empirically identified 3 predictors.

Comprehensive data analysis incorporating advanced search procedures is the objective . This data set is "thorny" and I look forward to any suggested improvements that may arise from this discussion. I used a beta version of AUTOBOX (which I have helped to develop) as my tool of choice.

As to your "proposed method" it may work for this series but there are way too many assumptions such as one and only one stochastic trends, one and only one deterministic trend (1,2,3,...), no pulses , no level shifts (intercept changes) , no seasonal pulses , constant error variance , constant parameters over time et al to suggest generality of approach. You are arguing from the specific to the general. There are tons of wrong ad hoc solutions waiting to be specified and just a handful of "correct solutions" of which my approach is just one.

A close-up showing observations 51 to 100 suggest a significant deviation/change in pattern (i.e. implied intercept) starting at period 65 ( which was picked/identified by the analytics as a level shift (change in intercept)) suggesting a possible simulation flaw as obs 51-64 have a different pattern than obs 65-100.

Related Solutions

Time Series Forecasting – Stochastic vs Deterministic Trend and Seasonality

1) As regards your first question, some tests statistics have been developed and discussed in the literature to test the null of stationarity and the null of a unit root. Some of the many papers that were written on this issue are the following:

Related to the trend:

Dickey, D. y Fuller, W. (1979a), Distribution of the estimators for autoregressive time series with a unit root, Journal of the American Statistical Association 74, 427-31.
Dickey, D. y Fuller, W. (1981), Likelihood ratio statistics for autoregressive time series with a unit root, Econometrica 49, 1057-1071.
Kwiatkowski, D., Phillips, P., Schmidt, P. y Shin, Y. (1992), Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?, Journal of Econometrics 54, 159-178.
Phillips, P. y Perron, P. (1988), Testing for a unit root in time series regression, Biometrika 75, 335-46.
Durlauf, S. y Phillips, P. (1988), Trends versus random walks in time series analysis, Econometrica 56, 1333-54.

Related to the seasonal component:

Hylleberg, S., Engle, R., Granger, C. y Yoo, B. (1990), Seasonal integration and cointegration, Journal of Econometrics 44, 215-38.
Canova, F. y Hansen, B. E. (1995), Are seasonal patterns constant over time? a test for seasonal stability, Journal of Business and Economic Statistics 13, 237-252.
Franses, P. (1990), Testing for seasonal unit roots in monthly data, Technical Report 9032, Econometric Institute.
Ghysels, E., Lee, H. y Noh, J. (1994), Testing for unit roots in seasonal time series. some theoretical extensions and a monte carlo investigation, Journal of Econometrics 62, 415-442.

The textbook Banerjee, A., Dolado, J., Galbraith, J. y Hendry, D. (1993), Co-Integration,Error Correction, and the econometric analysis of non-stationary data, Advanced Texts in Econometrics. Oxford University Press is also a good reference.

2) Your second concern is justified by the literature. If there is a unit root test then the traditional t-statistic that you would apply on a linear trend does not follow the standard distribution. See for example, Phillips, P. (1987), Time series regression with unit root, Econometrica 55(2), 277-301.

If a unit root exists and is ignored, then the probability of rejecting the null that the coefficient of a linear trend is zero is reduced. That is, we would end up modelling a deterministic linear trend too often for a given significance level. In the presence of a unit root we should instead transform the data by taking regular differences to the data.

3) For illustration, if you use R you can do the following analysis with your data.

x <- structure(c(7657, 5451, 10883, 9554, 9519, 10047, 10663, 10864, 
  11447, 12710, 15169, 16205, 14507, 15400, 16800, 19000, 20198, 
  18573, 19375, 21032, 23250, 25219, 28549, 29759, 28262, 28506, 
  33885, 34776, 35347, 34628, 33043, 30214, 31013, 31496, 34115, 
  33433, 34198, 35863, 37789, 34561, 36434, 34371, 33307, 33295, 
  36514, 36593, 38311, 42773, 45000, 46000, 42000, 47000, 47500, 
  48000, 48500, 47000, 48900), .Tsp = c(1, 57, 1), class = "ts")

First, you can apply the Dickey-Fuller test for the null of a unit root:

require(tseries)
adf.test(x, alternative = "explosive")
#   Augmented Dickey-Fuller Test
#   Dickey-Fuller = -2.0685, Lag order = 3, p-value = 0.453
#   alternative hypothesis: explosive

and the KPSS test for the reverse null hypothesis, stationarity against the alternative of stationarity around a linear trend:

kpss.test(x, null = "Trend", lshort = TRUE)
#   KPSS Test for Trend Stationarity
#   KPSS Trend = 0.2691, Truncation lag parameter = 1, p-value = 0.01

Results: ADF test, at the 5% significance level a unit root is not rejected; KPSS test, the null of stationarity is rejected in favour of a model with a linear trend.

Aside note: using lshort=FALSE the null of the KPSS test is not rejected at the 5% level, however, it selects 5 lags; a further inspection not shown here suggested that choosing 1-3 lags is appropriate for the data and leads to reject the null hypothesis.

In principle, we should guide ourselves by the test for which we were able to the reject the null hypothesis (rather than by the test for which we did not reject (we accepted) the null). However, a regression of the original series on a linear trend turns out to be not reliable. On the one hand, the R-square is high (over 90%) which is pointed in the literature as an indicator of spurious regression.

fit <- lm(x ~ 1 + poly(c(time(x))))
summary(fit)
#Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)    
#(Intercept)       28499.3      381.6   74.69   <2e-16 ***
#poly(c(time(x)))  91387.5     2880.9   31.72   <2e-16 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 2881 on 55 degrees of freedom
#Multiple R-squared:  0.9482,   Adjusted R-squared:  0.9472 
#F-statistic:  1006 on 1 and 55 DF,  p-value: < 2.2e-16

On the other hand, the residuals are autocorrelated:

acf(residuals(fit)) # not displayed to save space

Moreover, the null of a unit root in the residuals cannot be rejected.

adf.test(residuals(fit))
#   Augmented Dickey-Fuller Test
#Dickey-Fuller = -2.0685, Lag order = 3, p-value = 0.547
#alternative hypothesis: stationary

At this point, you can choose a model to be used to obtain forecasts. For example, forecasts based on a structural time series model and on an ARIMA model can be obtained as follows.

# StructTS
fit1 <- StructTS(x, type = "trend")
fit1
#Variances:
# level    slope  epsilon  
#2982955        0   487180 
# 
# forecasts
p1 <- predict(fit1, 10, main = "Local trend model")
p1$pred
# [1] 49466.53 50150.56 50834.59 51518.62 52202.65 52886.68 53570.70 54254.73
# [9] 54938.76 55622.79

# ARIMA
require(forecast)
fit2 <- auto.arima(x, ic="bic", allowdrift = TRUE)
fit2
#ARIMA(0,1,0) with drift         
#Coefficients:
#         drift
#      736.4821
#s.e.  267.0055
#sigma^2 estimated as 3992341:  log likelihood=-495.54
#AIC=995.09   AICc=995.31   BIC=999.14
#
# forecasts
p2 <- forecast(fit2, 10, main = "ARIMA model")
p2$mean
# [1] 49636.48 50372.96 51109.45 51845.93 52582.41 53318.89 54055.37 54791.86
# [9] 55528.34 56264.82

A plot of the forecasts:

par(mfrow = c(2, 1), mar = c(2.5,2.2,2,2))
plot((cbind(x, p1$pred)), plot.type = "single", type = "n", 
  ylim = range(c(x, p1$pred + 1.96 * p1$se)), main = "Local trend model")
grid()
lines(x)
lines(p1$pred, col = "blue")
lines(p1$pred + 1.96 * p1$se, col = "red", lty = 2)
lines(p1$pred - 1.96 * p1$se, col = "red", lty = 2)
legend("topleft", legend = c("forecasts", "95% confidence interval"), 
  lty = c(1,2), col = c("blue", "red"), bty = "n")
plot((cbind(x, p2$mean)), plot.type = "single", type = "n", 
  ylim = range(c(x, p2$upper)), main = "ARIMA (0,1,0) with drift")
grid()
lines(x)
lines(p2$mean, col = "blue")
lines(ts(p2$lower[,2], start = end(x)[1] + 1), col = "red", lty = 2)
lines(ts(p2$upper[,2], start = end(x)[1] + 1), col = "red", lty = 2)

trend forecasts

The forecasts are similar in both cases and look reasonable. Notice that the forecasts follow a relatively deterministic pattern similar to a linear trend, but we did not modelled explicitly a linear trend. The reason is the following: i) in the local trend model, the variance of the slope component is estimated as zero. This turns the trend component into a drift that has the effect of a linear trend. ii) ARIMA(0,1,1), a model with a drift is selected in a model for the differenced series.The effect of the constant term on a differenced series is a linear trend. This is discussed in this post.

You may check that if a local model or an ARIMA(0,1,0) without drift are chosen, then the forecasts are a straight horizontal line and, hence, would have no resemblance with the observed dynamic of the data. Well, this is part of the puzzle of unit root tests and deterministic components.

Edit 1 (inspection of residuals): The autocorrelation and partial ACF do not suggest a structure in the residuals.

resid1 <- residuals(fit1)
resid2 <- residuals(fit2)
par(mfrow = c(2, 2))
acf(resid1, lag.max = 20, main = "ACF residuals. Local trend model")
pacf(resid1, lag.max = 20, main = "PACF residuals. Local trend model")
acf(resid2, lag.max = 20, main = "ACF residuals. ARIMA(0,1,0) with drift")
pacf(resid2, lag.max = 20, main = "PACF residuals. ARIMA(0,1,0) with drift")

ACF-PACF

As IrishStat suggested, checking for the presence of outliers is also advisable. Two additive outliers are detected using the package tsoutliers.

require(tsoutliers)
resol <- tsoutliers(x, types = c("AO", "LS", "TC"), 
  remove.method = "bottom-up", 
  args.tsmethod = list(ic="bic", allowdrift=TRUE))
resol
#ARIMA(0,1,0) with drift         
#Coefficients:
#         drift        AO2       AO51
#      736.4821  -3819.000  -4500.000
#s.e.  220.6171   1167.396   1167.397
#sigma^2 estimated as 2725622:  log likelihood=-485.05
#AIC=978.09   AICc=978.88   BIC=986.2
#Outliers:
#  type ind time coefhat  tstat
#1   AO   2    2   -3819 -3.271
#2   AO  51   51   -4500 -3.855

Looking at the ACF, we can say that, at the 5% significance level, the residuals are random in this model as well.

par(mfrow = c(2, 1))
acf(residuals(resol$fit), lag.max = 20, main = "ACF residuals. ARIMA with additive outliers")
pacf(residuals(resol$fit), lag.max = 20, main = "PACF residuals. ARIMA with additive outliers")

enter image description here

In this case, the presence of potential outliers does not appear to distort the performance of the models. This is supported by the Jarque-Bera test for normality; the null of normality in the residuals from the initial models (fit1, fit2) is not rejected at the 5% significance level.

jarque.bera.test(resid1)[[1]]
# X-squared = 0.3221, df = 2, p-value = 0.8513
jarque.bera.test(resid2)[[1]]
#X-squared = 0.426, df = 2, p-value = 0.8082

Edit 2 (plot of residuals and their values) This is how the residuals look like:

residuals

And these are their values in a csv format:

0;6.9205
-0.9571;-2942.4821
2.6108;4695.5179
-0.5453;-2065.4821
-0.2026;-771.4821
0.1242;-208.4821
0.1909;-120.4821
-0.0179;-535.4821
0.1449;-153.4821
0.484;526.5179
1.0748;1722.5179
0.3818;299.5179
-1.061;-2434.4821
0.0996;156.5179
0.4805;663.5179
0.8969;1463.5179
0.4111;461.5179
-1.0595;-2361.4821
0.0098;65.5179
0.5605;920.5179
0.8835;1481.5179
0.7669;1232.5179
1.4024;2593.5179
0.3785;473.5179
-1.1032;-2233.4821
-0.3813;-492.4821
2.2745;4642.5179
0.2935;154.5179
-0.1138;-165.4821
-0.8035;-1455.4821
-1.2982;-2321.4821
-1.9463;-3565.4821
-0.1648;62.5179
-0.1022;-253.4821
0.9755;1882.5179
-0.5662;-1418.4821
-0.0176;28.5179
0.5;928.5179
0.6831;1189.5179
-1.8889;-3964.4821
0.3896;1136.5179
-1.3113;-2799.4821
-0.9934;-1800.4821
-0.4085;-748.4821
1.2902;2482.5179
-0.0996;-657.4821
0.5539;981.5179
2.0007;3725.5179
1.0227;1490.5179
0.27;263.5179
-2.336;-4736.4821
1.8994;4263.5179
0.1301;-236.4821
-0.0892;-236.4821
-0.1148;-236.4821
-1.1207;-2236.4821
0.4801;1163.5179

Time Series Analysis – Stochastic vs. Deterministic Trend in Time Series

Deterministic Trend

$$ y_t = \beta_0 + \beta_1 t + \epsilon_t $$ where $\{\epsilon_t\}$ is white noise, for simplicity. Same discussion applies to the case where $\{\epsilon_t\}$ is a covariance-stationary process (e.g. ARIMA with $d = 0$).

The process is random fluctuations around a deterministic linear trend $\beta_0 + \beta_1 t$. Hence the terminology "deterministic trend".

Such processes also called trend-stationary. If you remove the linear trend, you recover the stationary process $\{\epsilon_t\}$.

Stochastic Trend

$$ y_t = \beta_0 + \beta_1 t + \eta_t $$ where $\{\eta_t\}$ is a random walk, for simplicity. Same discussion applies to the case where $\{\eta_t\}$ is an $I(1)$ process (e.g. ARIMA with $d = 1$). Equivalently, $$ y_t = y_0 + \beta_0 + \beta_1 t + \sum_{s = 1}^{t} \epsilon_t $$ where $\{\epsilon_t\}$ is the white noise driving the random walk $\{\eta_t\}$. The "stochastic trend" terminology refers to $\eta_t$. The random walk is a highly persistent process, giving its sample path the appearance of a "trend".

Such processes are also called difference-stationary. If you take first-difference, you recover the stationary process $\{\epsilon_t\}$, i.e. $$ \Delta y_t = \beta_1 + \epsilon_t, $$ which is the same series (random walk with drift) from your second link.

Visual Similarity

You can observe via simulation that the sample paths from these two models can be visually similar---e.g. choose $\beta_1=1$ and $\epsilon_t \stackrel{i.i.d.}{\sim}(0,1)$.

This is because the linear trend $\beta_0 + \beta_1 t$ dominates. More precisely, for both models $$ \frac{y_t}{t} = \beta_1 + o_p(1). $$ Only the slope term $\beta_1$ is not negligible in the limit. For the deterministic trend case, it is clear that $\frac{\epsilon_t}{t} = o_p(1)$. For the stochastic trend case, $\frac{\eta_t}{t} = o_p(1)$ because $\frac{\eta_t}{\sqrt{t}}$ converges in distribution to a normal distribution (Central Limit Theorem).

Statistical Testing

The visual similarity of sample paths motivates the problem of statistically distinguishing these two models. This is the purpose of unit root tests---e.g. the (Augmented) Dickey-Fuller test, which is historically the first such test.

For the ADF test, you basically take the detrended series $\tilde{y}_t$ (residuals from regressing $y_t$ on $1$ and $t$), run the regression $$ \Delta \tilde{y}_t = \alpha \tilde{y}_{t-1} + \tilde{\epsilon}_t, $$ and consider the $t$-statistic for $\alpha = 0$. It the $t$-statistic is small, you reject the null of stochastic trend.

The empirical reasoning behind the ADF test is simple. Even though the sample paths themselves are similar, the detrended series would look quite different. Under trend-stationarity, the detrended series would appear stationary. On the other hand, if a difference-stationary model is mistakenly detrended, the detrended series would not appear stationary.

Best Answer

Related Solutions

Time Series Forecasting – Stochastic vs Deterministic Trend and Seasonality

Time Series Analysis – Stochastic vs. Deterministic Trend in Time Series

Related Question