I want to solve the first exercice of the Multiple Regression Chapter of R. Hyndman's online book on Time Series Forecasting (see https://www.otexts.org/fpp/5/8). I use R
with fpp
package as wanted in the exercise.
I am blocked in the following question:
c. Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.
Indeed, I don't know how to make the function tslm
work with my dummy vector for the surfing festival. Here is my code.
library(fpp)
log_fancy = log(fancy)
dummy_fest_mat = matrix(0, nrow=84, ncol=1)
for(h in 1:84)
if(h%%12 == 3) #this loop builds a vector of length 84 with
dummy_fest_mat[h,1] = 1 #1 corresponding to each month March
dummy_fest_mat[3,1] = 0 #festival started one year later
dummy_fest = ts(dummy_fest_mat, freq = 12, start=c(1987,1))
fit = tslm(log_fancy ~ trend + season + dummy_fest)
When I do summary(fit)
, I see that the regression coefficients have been well calculated, but when I continue with forecast(fit)
I get the following error :
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
variables have not equal length (found for 'factor(dummy_fest)')
In addition: Warning message:
'newdata' had 50 rows but variables found have 84 rows
But what is even stranger is that when I do forecast(fit, h=84)
, it works!!
I don't know what is happening here, can someone explain me?
Best Answer
First of all, you should be in the habit of keeping your datasets in data.frames:
This is cleaner than leaving all your data lying around in the global environment, and will help prevent bugs in your analysis.
However, when we go to forecast our dataset, we get an error:
Interestingly, if we omit the dummy_fest variable, the forecast works fine:
What's going on here?
The answer, of course, is that while the forecast function is very smart and knows how to extrapolate your
trend
andseason
variables, it unfortunately knows nothing about surfing festivals in eastern Australia.You need to tell the forecast function when the surfing festival will occur in the future!
For example, here's a forecast assuming the surfing festival is cancelled, and never happens again:
You'll probably want to edit
future_data
to include 1's when you think the surfing festival will occur in the future.