Solved – Comparing 2 time series in R

rsurveytime series

I was wondering what kind of tests one would use to compare these two time series.

The first data set(in percentages) are results from a weekly survey that asks a YES/NO question on whether someone has a full time job.

The second data set are weekly sales totals.

I have the same number of data points (300).

Any suggestions on what types of analysis I could do with this set? or how to analyze the trends?

So far I attempted a Cross correlation Function in R and found a 0.39 correlation when the survey data leads by 3 weeks.

I also compared the HoltWinters exponentially smoothed daily values with the daily sales values and found a maximum correlation of 0.45 at lead = 12.

Any suggestions would be appreciated.

Thanks

Best Answer

There are a number of possible models at a variety of levels of complexity. These include (some are very closely related):

Time series regression with lagged variables

Lagged regression models. See also distributed lag models

Regression with autocorrelated errors

Transfer function modelling /lagged regression with autocorrelated errors

ARMAX models

Vector autoregressive models

State-space/dynamic linear models can incorporate both autocorrelated and regression components

Because your input series is 0/1 you may want to look at lagged regression with autocorrelated errors, but watch for seasonal and calendar effects (like holidays).

So simple-ish models might perhaps look something like

$\qquad\text{ Sales}_t = \phi_0+\phi_1\,\text{Sales}_{t-1} +\beta_3\,\text{job}_{t-3}+\beta_4\,\text{job}_{t-4}+\epsilon_t$

or perhaps something like

$\qquad\text{ Sales}_t = \alpha +\beta_3\,\text{job}_{t-3}+\beta_{12}\,\text{job}_{t-12}+\text{seasonal}_{t}+\eta_t$

where $\eta_t$ is in turn some ARMA model for the noise term (though you may well want more lags in there than just one) -- or a variety of other possibilities. [The seasonal term above doesn't have a parameter because it's likely to have several components, and so several parameters; consider it a placeholder for a model for that component of the data. Neither of those models are likely to be sufficient, they're just to get a general sense of what a simple model might look like]

You may also want to consider whether the binary job-status variable needs a model itself (if you want to forecast further than the smallest lag involving it, it may well be essential to at least consider whether there are any such effects there -- see transfer function models, but you have to consider the special nature of the binary variable)

Once you have an appropriate model for sales that captures the main features well, you can look as testing. You should have enough data (looks like several years) to hold some data out for out-of-sample model testing and validation. I'd start by considering the features of sales alone - is it stationary? Autocorrelated? Does it experience any seasonal/cyclical or calendar components? Are there other major drivers to consider?

Since you mention R, note that the function tslm in the package forecast can be handy for including seasonal or trend components in regression models.

A book that discusses nearly all of those topics is Shumway and Stoffer Time Series Analysis and its Applications (3rd ed is at Stoffer's page here). Another highly recommended text is Forecasting Principles and Practice, Hyndman and Athana­sopou­los, here, which covers some of the things I mentioned (but not as many).