I was wondering what kind of tests one would use to compare these two time series.
The first data set(in percentages) are results from a weekly survey that asks a YES/NO question on whether someone has a full time job.
The second data set are weekly sales totals.
I have the same number of data points (300).
Any suggestions on what types of analysis I could do with this set? or how to analyze the trends?
So far I attempted a Cross correlation Function in R and found a 0.39 correlation when the survey data leads by 3 weeks.
I also compared the HoltWinters exponentially smoothed daily values with the daily sales values and found a maximum correlation of 0.45 at lead = 12.
Any suggestions would be appreciated.
Thanks
Best Answer
There are a number of possible models at a variety of levels of complexity. These include (some are very closely related):
Time series regression with lagged variables
Lagged regression models. See also distributed lag models
Regression with autocorrelated errors
Transfer function modelling /lagged regression with autocorrelated errors
ARMAX models
Vector autoregressive models
State-space/dynamic linear models can incorporate both autocorrelated and regression components
Because your input series is 0/1 you may want to look at lagged regression with autocorrelated errors, but watch for seasonal and calendar effects (like holidays).
So simple-ish models might perhaps look something like
$\qquad\text{ Sales}_t = \phi_0+\phi_1\,\text{Sales}_{t-1} +\beta_3\,\text{job}_{t-3}+\beta_4\,\text{job}_{t-4}+\epsilon_t$
or perhaps something like
$\qquad\text{ Sales}_t = \alpha +\beta_3\,\text{job}_{t-3}+\beta_{12}\,\text{job}_{t-12}+\text{seasonal}_{t}+\eta_t$
where $\eta_t$ is in turn some ARMA model for the noise term (though you may well want more lags in there than just one) -- or a variety of other possibilities. [The seasonal term above doesn't have a parameter because it's likely to have several components, and so several parameters; consider it a placeholder for a model for that component of the data. Neither of those models are likely to be sufficient, they're just to get a general sense of what a simple model might look like]
You may also want to consider whether the binary job-status variable needs a model itself (if you want to forecast further than the smallest lag involving it, it may well be essential to at least consider whether there are any such effects there -- see transfer function models, but you have to consider the special nature of the binary variable)
Once you have an appropriate model for sales that captures the main features well, you can look as testing. You should have enough data (looks like several years) to hold some data out for out-of-sample model testing and validation. I'd start by considering the features of sales alone - is it stationary? Autocorrelated? Does it experience any seasonal/cyclical or calendar components? Are there other major drivers to consider?
Since you mention R, note that the function
tslm
in the packageforecast
can be handy for including seasonal or trend components in regression models.A book that discusses nearly all of those topics is Shumway and Stoffer Time Series Analysis and its Applications (3rd ed is at Stoffer's page here). Another highly recommended text is Forecasting Principles and Practice, Hyndman and Athanasopoulos, here, which covers some of the things I mentioned (but not as many).