Unfortunately your mission is doomed to failure since you are restricted to R and SPSS. You need to identify lead and lag relationship structure for each of the events/holidays/exogenous variables that may come into play. You need to detect possible Time Trends which SPSS can't do. You need to incorporate Daily Trends / Predictions into each of the hourly forecasts in order to provide a consolidated.reeconciled forecast. You need to be concerned with changing parameters and changing variance. Hope this helps. We have been modelling this kind of data for years in an automatic manner, subject of course to optional user-specified controls.
EDIT: As OP requested I present here a typical analysis. I took one if the busier hours and developed a daily model. In a complete analysis all 24 hours would be developed and also a daily model in order to reconcile the forecasts. Following is a partial list of the model . In addition to the significant regressors (note the actual lead and lag structure have been omitted ) there were indicators reflecting the seasonality , level shifts , daily effects , changes in daily effects , and unusual values not consistent with history. The model statistics are . A plot of the forecasts for the next 360 days is shown here . The Actual/Fit/Forecast graph neatly summarizes the results .When faced with a tremendously complex problem (like this one!) one needs to show up with a lot of courage , experience and computer productivity aids. Just advise your management that the problem is solvable but not necessarily by using primitive tools. I hope this gives you encouragement to continue in your efforts as your previous comments have been very professional, geared towards personal enrichment and learning. I would add that one needs to know the expected value of this analysis and use that as a guideline when considering additional software. Perhaps you need a louder voice to help direct your "directors" towards a feasible solution to this challenging task.
After reviewing the Daily Totals and each of the 24 Hourly Models , I would definitely reflect that the Number Of Visits is in a serious downslide ! This kind of analysis by a prospective buyer would suggest a non-purchase while a seller would be wise to redouble their efforts to sell the business based upon this very negative projection.
You're hitting the wall because you're exhausting limitations of the first fourier transform fourier(1:n,i,m1)
. As RandomDude correctly pointed out above, # of transforms i
should be less than half period (m1).
However, if, with your code, you run 2 cycles -- one for i
, and another for j
, where j
would be # of transforms for the second seasonality cycle fourier(1:n,j,m3)
, you would still have a lot of room for model improvement.
This is what I've got from your data, even without dummies, only based on AR, MA, and data seasonality:
library(forecast)
y <- msts(ts, c(7,365)) # multiseasonal ts
fit <- auto.arima(y, seasonal=F, xreg=fourier(y, K=c(3,30)))
fit_f <- forecast(fit, xreg= fourierf(y, K=c(3,30), 180), 180)
plot(fit_f)
I suspect the performance will even improve when holidays are added.
Best Answer
My suggestion is simply to exclude holidays or use a dummy as suggested in the comments. In finance for example, in most cases week-ends are excluded from the time-series. why should we model the sales where we are sure that there can be no sales due to holidays and store closures? The coefficients will be estimated in such a way that they will be constant across all the samples t, so they will be influenced to some extent by the calendar effect, and if you do not adjust for this, that effect will indirectly spill to the coefficient estimates to some extent (i.e. to some extent we could imagine that the true autocorrelation will be underestimated assuming that 0s are several compared to the total number of observations). So why taking into account dates where there cannot be sales for “external, calendar, reasons” not due to the true autocorrelation in the time series? Model this as a calendar effect because it is! So that you can isolate the calendar effect from the conditional mean effect due to time-series autocorrelation and make the estimate of the latter cleaner.
If instead 0 sales are not due to holidays, then my best advice is to leave those 0s, because it is “true information”, or at most treat them as outliers.