Time Series – Building a Model with Multiple Independent Variables

rregressionspsstime series

I am working on a project, and I am totally new to statistics. I have sales data for last two years at week level, along with other variables like temperature, holiday (TRUE/FALSE), where holiday are nominal variables. I have to do forecasting for the next 52 weeks. I have the following questions:

  1. Can I use time series regression model where sales would be dependent, and temperature and
    holiday would be independent variables?
  2. How to decide which independent variable would have more impact on the sales?
  3. Can we do forecasting using nominal variables? Will dummy coding work?
  4. Can we do it in R/SPSS?

I would appreciate any kind of help. Thanks in advance.

Best Answer

Regarding questions (1), (3), and (4); yes, there are a lot of options for modelling multivariate time series, and this is absolutely something you can accomplish with R. You said that you don't have much experience with statistics, so I'm not sure how familiar you are with R (if at all), but a possible approach would be to use the R package "dynlm":

## You'll need these packages
install.packages("dynlm",dependencies=TRUE)
library(dynlm)
if(is.element("zoo",installed.packages()[,1])){
  library(zoo)
} else {
  install.packages("zoo",dependencies=TRUE)
  library(zoo)
}
## Generating some nonsense data for demonstration
## 104 dates, 1 week apart
d1 <- as.Date("01/01/2012",format='%m/%d/%Y')
dSeq <- seq.Date(from=d1,
                 by='week',
                 length.out=104)
## Dependent variable
Y <- rnorm(104,50,10) + rnorm(104,10,1)*cos((1:104)/6)
## Independent variable for temperature
Temp <- rnorm(104,10,1) + cos((1:104)/12)
## Dummy variable for holidays (just picked a few off the calendar)
Holiday <- rep(0,104)
Holiday[c(3,3+52, 8,8+52, 22,22+52, 47,47+52, 52,52+52)] <- 1
Holiday <- ifelse(Holiday==0,"N","Y")
## Make a data.frame to hold variables
aDF <- data.frame(
  Date=dSeq,
  Y=Y,
  Temp=Temp,
  Holiday=Holiday)
## Make a time series version of this with the "zoo" function
## for using dynamic linear model.
zDF <- aDF
zDF[,2] <- zoo(aDF[,2],aDF[,1])
zDF[,3] <- zoo(aDF[,3],aDF[,1])
zDF[,4] <- zoo(aDF[,4],aDF[,1])
## A possible DLM... type ?dynlm for details of the function
dlm1 <- dynlm(Y ~ L(Y,1) + L(Y,13) + Temp + Holiday, data=zDF)
## Model summary
summary(dlm1)
## Estimated coefficients:
coefficients(dlm1)

Like I said, this is just one of many possibilities for analyzing multivariate time series in R; but to be honest, if you are "totally new to statistics" and not working on this particular project with someone who has experience with DLMs or similar models, I highly suggest reading through Forecasting: principles and practice by Rob Hyndman and George Athana­sopou­los. It's a free online book written by two very knowledgeable econometricians and a significant amount of the content is geared towards people with little or no formal background in statistics / forecasting methods. Here's a link: https://www.otexts.org/fpp. On a related note, if you are going to be regularly working with time series data in R, I would suggest installing Hyndman's R package forecast, which is phenomenally useful. Additionally, your second question about deciding which independent variables have a more significant impact on sales is not something which can be succinctly answered. A typical modelling process involves a lot of steps related to diagnostic checking and evaluation of goodness-of-fit, and the tools for accomplishing such tasks can vary greatly depending on which type of statistical model you are using. Unfortunately, if you are brand new to statistics you will almost certainly have to invest a decent amount of time into understanding some important technical aspects of modelling, because there is much more to consider than the correlation of two variables, for example. This is another reason that I recommend reading through Hyndman and Athana­sopou­los' online book, as it addresses a wide variety of fundamental aspects involved in the forecasting process.

Related Question