Solved – Comparing 2 time series in R

rsurveytime series

I was wondering what kind of tests one would use to compare these two time series.

The first data set(in percentages) are results from a weekly survey that asks a YES/NO question on whether someone has a full time job.

The second data set are weekly sales totals.

I have the same number of data points (300).

Any suggestions on what types of analysis I could do with this set? or how to analyze the trends?

So far I attempted a Cross correlation Function in R and found a 0.39 correlation when the survey data leads by 3 weeks.

I also compared the HoltWinters exponentially smoothed daily values with the daily sales values and found a maximum correlation of 0.45 at lead = 12.

Any suggestions would be appreciated.

Thanks

Best Answer

There are a number of possible models at a variety of levels of complexity. These include (some are very closely related):

Time series regression with lagged variables

Lagged regression models. See also distributed lag models

Regression with autocorrelated errors

Transfer function modelling /lagged regression with autocorrelated errors

ARMAX models

Vector autoregressive models

State-space/dynamic linear models can incorporate both autocorrelated and regression components

Because your input series is 0/1 you may want to look at lagged regression with autocorrelated errors, but watch for seasonal and calendar effects (like holidays).

So simple-ish models might perhaps look something like

$\qquad\text{ Sales}_t = \phi_0+\phi_1\,\text{Sales}_{t-1} +\beta_3\,\text{job}_{t-3}+\beta_4\,\text{job}_{t-4}+\epsilon_t$

or perhaps something like

$\qquad\text{ Sales}_t = \alpha +\beta_3\,\text{job}_{t-3}+\beta_{12}\,\text{job}_{t-12}+\text{seasonal}_{t}+\eta_t$

where $\eta_t$ is in turn some ARMA model for the noise term (though you may well want more lags in there than just one) -- or a variety of other possibilities. [The seasonal term above doesn't have a parameter because it's likely to have several components, and so several parameters; consider it a placeholder for a model for that component of the data. Neither of those models are likely to be sufficient, they're just to get a general sense of what a simple model might look like]

You may also want to consider whether the binary job-status variable needs a model itself (if you want to forecast further than the smallest lag involving it, it may well be essential to at least consider whether there are any such effects there -- see transfer function models, but you have to consider the special nature of the binary variable)

Once you have an appropriate model for sales that captures the main features well, you can look as testing. You should have enough data (looks like several years) to hold some data out for out-of-sample model testing and validation. I'd start by considering the features of sales alone - is it stationary? Autocorrelated? Does it experience any seasonal/cyclical or calendar components? Are there other major drivers to consider?

Since you mention R, note that the function tslm in the package forecast can be handy for including seasonal or trend components in regression models.

A book that discusses nearly all of those topics is Shumway and Stoffer Time Series Analysis and its Applications (3rd ed is at Stoffer's page here). Another highly recommended text is Forecasting Principles and Practice, Hyndman and Athanasopoulos, here, which covers some of the things I mentioned (but not as many).

Related Solutions

Solved – What to do about Seasonality Patterns in ACF, Time Series Data

You can see in the arima code that there is a seasonal component that is set to (0,0,0), it is the seasonal AR, I, and MA component that you can change the values for. So if you're using an AR(1) that also has a seasonal AR(12) (annual seasonal AR), then the seasonal c=(0,0,0) vector should be (1,0,0) and the period should be changed from NA as well to say what the period of the season is.

arima(x, order = c(0, 0, 0),
      seasonal = list(order = c(0, 0, 0), period = NA),      
      xreg = NULL, include.mean = TRUE,
      transform.pars = TRUE,
      fixed = NULL, init = NULL,
      method = c("CSS-ML", "ML", "CSS"),
      n.cond, optim.method = "BFGS",
      optim.control = list(), kappa = 1e6)

Solved – Time Series Forecasting with Daily Data: ARIMA with regressor

You should be evaluating models and forecasts from different origins across different horizons and not one one number in order to gauge an approach.

I assume that your data is from the US. I prefer 3+ years of daily data as you can have two holidays landing on a weekend and get no weekday read. It looks like your Thanksgiving impact is a day off in the 2012 or there was a recording error of some kind and caused the model to miss the Thanksgiving day effect.

Januarys are typically low in the dataset if you look as a % of the year. Weekends are high. The dummies reflect this behavior....MONTH_EFF01, FIXED_EFF_N10507,FIXED_EFF_N10607

I have found that using an AR component with daily data assumes that the last two weeks day of the week pattern is how the pattern is in general which is a big assumption. We started with 11 monthly dummies and 6 daily dummies. Some dropped out of the model. B**1 means that there is a lag impact the day after a holiday. There were 6 special days of the month (days 2,3,5,21,29,30----21 might be spurious?) and 3 time trends, 2 seasonal pulses (where a day of the week started deviating from the typical, a 0 before this data and a 1 every 7th day after) and 2 outliers (note the thanksgiving!) This took just under 7 minutes to run. Download all results here www.autobox.com/se/dd/daily.zip

It includes a quick and dirty XLS sheet to check to see if the model makes sense. Of course, the XLS % are in fact bad as they are crude benchmarks.

Try estimating this model:

Y(T) =  .53169E+06                                                                                        
       +[X1(T)][(+  .13482E+06B** 1)]                                       M_HALLOWEEN
       +[X2(T)][(+  .17378E+06B**-3)]                                       M_JULY4TH
       +[X3(T)][(-  .11556E+06)]                                            M_MEMORIALDAY
       +[X4(T)][(-  .16706E+06B**-4+  .13960E+06B**-3-  .15636E+06B**-2                                                 
       -  .19886E+06B**-1)]                                                 M_NEWYEARS
       +[X5(T)][(+  .17023E+06B**-2-  .26854E+06B**-1-  .14257E+06B** 1)]   M_THANKSGIVI
       +[X6(T)][(-  71726.    )]                                            MONTH_EFF01
       +[X7(T)][(+  55617.    )]                                            MONTH_EFF02
       +[X8(T)][(+  27827.    )]                                            MONTH_EFF03
       +[X9(T)][(-  37945.    )]                                            MONTH_EFF09
       +[X10(T)[(-  23652.    )]                                            MONTH_EFF10
       +[X11(T)[(-  33488.    )]                                            MONTH_EFF11
       +[X12(T)[(+  39389.    )]                                            FIXED_EFF_N10107
       +[X13(T)[(+  63399.    )]                                            FIXED_EFF_N10207
       +[X14(T)[(+  .13727E+06)]                                            FIXED_EFF_N10307
       +[X15(T)[(+  .25144E+06)]                                            FIXED_EFF_N10407
       +[X16(T)[(+  .32004E+06)]                                            FIXED_EFF_N10507
       +[X17(T)[(+  .29156E+06)]                                            FIXED_EFF_N10607
       +[X18(T)[(+  74960.    )]                                            FIXED_DAY02
       +[X19(T)[(+  39299.    )]                                            FIXED_DAY03
       +[X20(T)[(+  27660.    )]                                            FIXED_DAY05
       +[X21(T)[(-  33451.    )]                                            FIXED_DAY21
       +[X22(T)[(+  43602.    )]                                            FIXED_DAY29
       +[X23(T)[(+  68016.    )]                                            FIXED_DAY30
       +[X24(T)[(+  226.98    )]                                            :TIME TREND        1                   1/  1   1/ 3/2011   I~T00001__010311stack
       +[X25(T)[(-  133.25    )]                                            :TIME TREND      423                  61/  3   2/29/2012   I~T00423__010311stack
       +[X26(T)[(+  164.56    )]                                            :TIME TREND      631                  91/  1   9/24/2012   I~T00631__010311stack
       +[X27(T)[(-  .42528E+06)]                                            :SEASONAL PULSE  733                 105/  5   1/ 4/2013   I~S00733__010311stack
       +[X28(T)[(-  .33108E+06)]                                            :SEASONAL PULSE  370                  53/  6   1/ 7/2012   I~S00370__010311stack
       +[X29(T)[(-  .82083E+06)]                                            :PULSE           326                  47/  4  11/24/2011   I~P00326__010311stack
       +[X30(T)[(+  .17502E+06)]                                            :PULSE           394                  57/  2   1/31/2012   I~P00394__010311stack
      +                    +   [A(T)]

Best Answer

Related Solutions

Solved – What to do about Seasonality Patterns in ACF, Time Series Data

Solved – Time Series Forecasting with Daily Data: ARIMA with regressor

Related Question