Time Series Analysis – Assessing Seasonal Covariates in a Seasonal ARIMA Model: A Comprehensive Guide

arimartime series

I am using R to develop an ARIMA model to evaluate the influence of several seasonal covariates (e.g., meteorological data) upon the incidence of a seasonal disease. I have weekly data available and have set the period equal to 52 weeks. Using the auto.arima function the disease of interest has a form of ARIMA(0,1,2)(0,0,1).

Using the TSA package, I can then evaluate the influence of each of the covariates:

covariates <- data.frame(covariate1, covariate2, covariate3)
model <- arima(disease, order=c(0,1,2), seasonal=list(order=c(0,0,1), period=52), xreg=covariates)

However, I am worried that this approach is identifying spurious associations between the covariates, each of which has a seasonal component… Is it more appropriate to decompose the covariate data and subtract the seasonal component before fitting the ARIMA model?

covariate1_components <- decompose(covariate1)
covariate1_adjust <- covariate1 - covariate1_components$seasonal
[...]
covariates_adjust <- data.frame(covariate1_adjust, covariate2_adjust, covariate3_adjust)
model2 <- arima(disease, order=c(0,1,2), seasonal=list(order=c(0,0,1), period=52), xreg=covariates_adjust)

Any thoughts on which of the two approaches would be preferable for evaluating seasonal covariates?

Best Answer

When you have covariates that are stochastic, you use pre-whitening methoDs to identify the relationship between the original series using the filtered series as proxies. Part of the abalysis is to identify thr transfer between Y and each of the suggested X's AND to identify the ARIMA structure of the noise ( which in general is not the ARIMA structure for the observed Y ) AND any omitted deterministixc structure/causals that are creating Pulses , Level Shifts , SEasonal Pulss and/or Local Time Trends in the residuals. After forming this possibly useful model, estimation and diagnostic checking may suggest remodelling. Care is then taken to test whether or not the model's parameters are constant over time and whether or not the error variance is constant over time.