Solved – Time series analysis via generalized additive models: model assumptions and stationarity

generalized-additive-modelmgcvrstationaritytime series

I have settled on building a generalized additive mixed model using mgcv::gamm, on data and for purposes I have described in more detail here. In a nutshell, I want to explain variations in monthly tourist numbers at two historic sites, depending on predictors such as weather and economic factors (e.g., Consumer Confidence Index), etc. All this, while taking into account the seasonal pattern in visitor numbers, an increasing trend in visitors over the years, and any autoregressive process in the data. Hopefully the choice of gamm() for modelling this scenario is reasonable. (Also, I am not really concerned with forecasts, rather just a good explanatory model.)

After checking out various sources (e.g., Gavin Simpson's blog posts here and here), there seems to be no mention of assessing stationarity before running such a generalized additive mixed model – yet, this appears to be a major point of focus with time series, generally. I am not clear why this is, and whether me just running gamm() directly on my data is fine (with no differencing done beforehand etc). I am assuming yes, but would rather make sure. Thanks!

Best Answer

The idea here is that by estimating the trend as a smooth function, the residuals then are a stationary process and the ARMA model is being estimated in the residuals. In other words, the estimated smoother is detrending the data and the residuals are subjected to an ARMA model. Of course it doesn't happen in a two-step process like this, but hopefully it is clearer why we don't require stationarity when modelling time series using GAM(M)s.

Another issue you'll potentially need to grapple with is identifiability of the trend and the correlation structure. If we have a high degree of autocorrelation in the observed series, then we could model this as:

  1. a wiggly trend (to capture the runs above and below the mean that high autocorrelation implies) and no autocorrelation in the residuals, or
  2. a simple, perhaps linear, trend, with strong autocorrelation in the residuals (say a large $\rho$ for an AR(1) process).

These two models are very similar to one another and absent any information you can give to separate the two (say by fixing the degrees of freedom of the spline or setting the parameters of the correlation structure in the residuals), then the data may not contain enough information to uniquely identify the separate trend and autocorrelation processes.

The whole reason behind my use of GAMs in my research is that I'm interested in estimating the trends in my time series and those trends are in general non-linear. The GAM allows me to estimate the thing I want. In classical time series modelling, the interest is in modelling data as stochastic trends using lagged versions of the response and / or current and lagged versions of a white noise process. This is of less interest in my work, but is clearly of broad interest in others.

I've written a paper on modelling time series using GAMs, which hopefully explains some of the approach. It's written for palaeoecologists but applies to any univariate time series, and Open Access so free to read and supported by full R code in the Supplements.

Related Question