Solved – Variations in time series

time series

I'm trying to do some really simple Time Series work. I started off looking at monthly counts but the more I thought about it (and the more I read posts by IrishStat) the more I realised my view of this was fundamentally flawed. My main reason for thinking this is the plethora of timing issues there are with using monthly aggregate figures:

Varying month length
Varying numbers of weekdays
Weekends
Holidays

But what gets me is that this must be a really common issue, I mean practically every big company out there must have an executive report that includes monthly figures. How does one balance for the discrepancies between the months you are comparing?

I'm using R to slowly crawl through this stuff and currently I'm just looking at cross correlations with lag. I've looked at decomposition and how to adjust for seasonality but how does one adjust for dates that you know will have an effect on business activity? Is there a way to integrate a list of dates that should be treated differently or somehow compensate for the number of weekend/holiday days in a month (or in a week, or in a year)?

A general explanation of the concept would be much appreciated.

Edited to make the question more succinct:

I am looking at a time series of monthly counts. 2 examples from this time series would be:

February 2012 is 29 days long, has 8 weekend days and zero public holidays
May 2013 is 31 days long, also has 8 weekend days but also has 2 public holidays

How would I compare them without having my conclusions adversely affected by their differences?

I could divide by days in month but this doesn't take into account holidays
I could divide by working days in month, but this doesn't differentiate between a holiday and a weekend.

Best Answer

You asked for a general explanation of the concept. Your comments about the current status of forecasting at three levels of aggregation is dead on ! My answer may not precisely deal with some of your specific interests as you have focused on some distractions but I thought that I would share the follwing with you. I was asked to discuss how software I had helped write could deal with and accomodate monthly vs weekly vs daily forecasts.

My response was in three parts : A. Overall comments on weekly versus monthly B. The argument for parsing the momrhly forecast to dai;ly using simple ratios C. The argument against #2 and FOR daily forecasts to be DIRECTLY developed and then used to make weekly and/or monthly forecasts.

Response A)

Monthly:

Advantages – Fast to compute, easier to model, easier to identify changes in trends, better for strategic long term forecasting

Disadvantages – If you need to plan as the daily level for capacity, people and spoilage of product then higher levels of forecasting won’t help understand the demand on a daily basis as a 1/30th ration estimate is clearly insufficient.

Causal variables that change on a frequent basis (ie daily/weekly – price, promotion) are not easily integrated into monthly analysis

Integrating Macroeconomic variables like Quarterly Unemployment requires an additional step of creating splines.

Weekly:

Advantages – When you can’t handle the modeling process at a daily level you “settle” for this. When you have very systematic cyclical cycles like “artic ice extents” that follow a rigid curve and not need for day of the week variations.

Disadvantages – Floating Holidays like Thanksgiving, Easter, Ramadan, Chinese New Year change every year and disrupt the estimate for the coefficients for the week of the year impact which CAN be handled by creating a variable for each.

The number of weeks in a year is subject to change and creates a statistical issue due to the fact that every year doesn’t have 52 weeks. We have seen the need to allocate the 53rd week to a “non-player” week to make the data a standard 52 week period which is workable, but disruptive compared to daily data.

Causal variables that change on a frequent basis (ie daily/weekly – price, promotion) are not easily integrated into monthly analysis

Integrating Macroeconomic variables like Quarterly Unemployment requires an additional step of creating splines.

Response B) ( tongue-in-cheek answer )

Assuming you had the daily data in a data warehouse and you wanted to develop daily from the monthly forecasts.

I would take monthly forecasts and partition it to daily in the following manner.

Compute daily averages from the history database thus D1,D2,….D7 averages are known and will be used I would compute the overall average (XBAR) and compute 7 indices I1=D1/XBAR ; I2=D2/XBAR …. I7=D7/XBAR thus the 7 I’s represent percentages i.e.

.9,1,2,…..8 for example.

I would then compute a forecast for DAY1 in the month by using the appropriate I value and get [1/30]*Monthly forecast*I , essentially adjusting the baseline daily forecast of 1/30 th of the monthly expectation.
Finally I would then normalize these DAILY forecasts so that they add to the monthly forecast.

Response C)

I should also add that the procedure I laid out in (B) is subject to a number of assumptions regarding the historical data , most of which are unrealistic in my opinion:

1) That there are no trends and no level shifts . 2) That there are no PULSES ( one time unusual values ) 3) That there are no Holiday effects OR special days in the month effects OR special weeks in the month effects or beginning/end-of the month effects 4) There are no seasonality effects (monthly or weekly ) 5) There have been no changes in the day-of-the-week averages over time 6) There is no autoregressive structure
7) There have been no chnages in model paramters or the error variance over time.

All of these considerations suggest that models should be developed at the daily level in order to provide information as quickly as possible.

Hope this helps !

Related Solutions

Solved – Seasonally adjusted month-to-month growth with underlying weekly seasonality

I model thus kind of data all the time. You need to incorporate

day-of-the-week
holiday effects ( lead , contemporaneous and lag effects )
special days-of-the-month
perhaps Friday before a holiday or a Monday after a holiday
weekly effects
monthly effects
ARIMA structure to render the errors white noise;
et.al. .

The statistical approach is called Transfer Function Modelling with Intervention DEtection. If you want to share your data either privately via dave@autobox.com or preferably via SE , I would be more than glad to actually show you the specifics of a final model and further your ability to do it yourself or at least to help you and others to understand what needs to be done and what can be done. In either case you come out smarter without spending any treasure be it coin or time.You might read some of my other responses to time series questions to learn more.

Solved – Time Series Forecasting with Daily Data: ARIMA with regressor

You should be evaluating models and forecasts from different origins across different horizons and not one one number in order to gauge an approach.

I assume that your data is from the US. I prefer 3+ years of daily data as you can have two holidays landing on a weekend and get no weekday read. It looks like your Thanksgiving impact is a day off in the 2012 or there was a recording error of some kind and caused the model to miss the Thanksgiving day effect.

Januarys are typically low in the dataset if you look as a % of the year. Weekends are high. The dummies reflect this behavior....MONTH_EFF01, FIXED_EFF_N10507,FIXED_EFF_N10607

I have found that using an AR component with daily data assumes that the last two weeks day of the week pattern is how the pattern is in general which is a big assumption. We started with 11 monthly dummies and 6 daily dummies. Some dropped out of the model. B**1 means that there is a lag impact the day after a holiday. There were 6 special days of the month (days 2,3,5,21,29,30----21 might be spurious?) and 3 time trends, 2 seasonal pulses (where a day of the week started deviating from the typical, a 0 before this data and a 1 every 7th day after) and 2 outliers (note the thanksgiving!) This took just under 7 minutes to run. Download all results here www.autobox.com/se/dd/daily.zip

It includes a quick and dirty XLS sheet to check to see if the model makes sense. Of course, the XLS % are in fact bad as they are crude benchmarks.

Try estimating this model:

Y(T) =  .53169E+06                                                                                        
       +[X1(T)][(+  .13482E+06B** 1)]                                       M_HALLOWEEN
       +[X2(T)][(+  .17378E+06B**-3)]                                       M_JULY4TH
       +[X3(T)][(-  .11556E+06)]                                            M_MEMORIALDAY
       +[X4(T)][(-  .16706E+06B**-4+  .13960E+06B**-3-  .15636E+06B**-2                                                 
       -  .19886E+06B**-1)]                                                 M_NEWYEARS
       +[X5(T)][(+  .17023E+06B**-2-  .26854E+06B**-1-  .14257E+06B** 1)]   M_THANKSGIVI
       +[X6(T)][(-  71726.    )]                                            MONTH_EFF01
       +[X7(T)][(+  55617.    )]                                            MONTH_EFF02
       +[X8(T)][(+  27827.    )]                                            MONTH_EFF03
       +[X9(T)][(-  37945.    )]                                            MONTH_EFF09
       +[X10(T)[(-  23652.    )]                                            MONTH_EFF10
       +[X11(T)[(-  33488.    )]                                            MONTH_EFF11
       +[X12(T)[(+  39389.    )]                                            FIXED_EFF_N10107
       +[X13(T)[(+  63399.    )]                                            FIXED_EFF_N10207
       +[X14(T)[(+  .13727E+06)]                                            FIXED_EFF_N10307
       +[X15(T)[(+  .25144E+06)]                                            FIXED_EFF_N10407
       +[X16(T)[(+  .32004E+06)]                                            FIXED_EFF_N10507
       +[X17(T)[(+  .29156E+06)]                                            FIXED_EFF_N10607
       +[X18(T)[(+  74960.    )]                                            FIXED_DAY02
       +[X19(T)[(+  39299.    )]                                            FIXED_DAY03
       +[X20(T)[(+  27660.    )]                                            FIXED_DAY05
       +[X21(T)[(-  33451.    )]                                            FIXED_DAY21
       +[X22(T)[(+  43602.    )]                                            FIXED_DAY29
       +[X23(T)[(+  68016.    )]                                            FIXED_DAY30
       +[X24(T)[(+  226.98    )]                                            :TIME TREND        1                   1/  1   1/ 3/2011   I~T00001__010311stack
       +[X25(T)[(-  133.25    )]                                            :TIME TREND      423                  61/  3   2/29/2012   I~T00423__010311stack
       +[X26(T)[(+  164.56    )]                                            :TIME TREND      631                  91/  1   9/24/2012   I~T00631__010311stack
       +[X27(T)[(-  .42528E+06)]                                            :SEASONAL PULSE  733                 105/  5   1/ 4/2013   I~S00733__010311stack
       +[X28(T)[(-  .33108E+06)]                                            :SEASONAL PULSE  370                  53/  6   1/ 7/2012   I~S00370__010311stack
       +[X29(T)[(-  .82083E+06)]                                            :PULSE           326                  47/  4  11/24/2011   I~P00326__010311stack
       +[X30(T)[(+  .17502E+06)]                                            :PULSE           394                  57/  2   1/31/2012   I~P00394__010311stack
      +                    +   [A(T)]

Best Answer

Related Solutions

Solved – Seasonally adjusted month-to-month growth with underlying weekly seasonality

Solved – Time Series Forecasting with Daily Data: ARIMA with regressor

Related Question