I am working on a project, and I am totally new to statistics. I have sales data for last two years at week level, along with other variables like temperature, holiday (TRUE/FALSE), where holiday are nominal variables. I have to do forecasting for the next 52 weeks. I have the following questions:
- Can I use time series regression model where sales would be dependent, and temperature and
holiday would be independent variables? - How to decide which independent variable would have more impact on the sales?
- Can we do forecasting using nominal variables? Will dummy coding work?
- Can we do it in R/SPSS?
I would appreciate any kind of help. Thanks in advance.
Best Answer
Regarding questions (1), (3), and (4); yes, there are a lot of options for modelling multivariate time series, and this is absolutely something you can accomplish with R. You said that you don't have much experience with statistics, so I'm not sure how familiar you are with R (if at all), but a possible approach would be to use the R package "dynlm":
Like I said, this is just one of many possibilities for analyzing multivariate time series in R; but to be honest, if you are "totally new to statistics" and not working on this particular project with someone who has experience with DLMs or similar models, I highly suggest reading through Forecasting: principles and practice by Rob Hyndman and George Athanasopoulos. It's a free online book written by two very knowledgeable econometricians and a significant amount of the content is geared towards people with little or no formal background in statistics / forecasting methods. Here's a link: https://www.otexts.org/fpp. On a related note, if you are going to be regularly working with time series data in R, I would suggest installing Hyndman's R package forecast, which is phenomenally useful. Additionally, your second question about deciding which independent variables have a more significant impact on sales is not something which can be succinctly answered. A typical modelling process involves a lot of steps related to diagnostic checking and evaluation of goodness-of-fit, and the tools for accomplishing such tasks can vary greatly depending on which type of statistical model you are using. Unfortunately, if you are brand new to statistics you will almost certainly have to invest a decent amount of time into understanding some important technical aspects of modelling, because there is much more to consider than the correlation of two variables, for example. This is another reason that I recommend reading through Hyndman and Athanasopoulos' online book, as it addresses a wide variety of fundamental aspects involved in the forecasting process.