I came across vars package in [R] and it seems the package does everything I need for a VAR model. The only exception is that I need to define dummy variables. For example think that my dependent vector has n elements and I need to estimate (eliminate) the impact of Christmas holiday on the first element. I define a dummy variable with 1 at Christmas day and 0 at other days, but if I add this into the dependent variables vector, model parameters become extensively large. Is there any other way to define dummies for one variable in this package?
Solved – Building VAR (Vector Autoregression) model with dumthe variables in R
econometricsrvector-autoregression
Related Solutions
Regarding questions (1), (3), and (4); yes, there are a lot of options for modelling multivariate time series, and this is absolutely something you can accomplish with R. You said that you don't have much experience with statistics, so I'm not sure how familiar you are with R (if at all), but a possible approach would be to use the R package "dynlm":
## You'll need these packages
install.packages("dynlm",dependencies=TRUE)
library(dynlm)
if(is.element("zoo",installed.packages()[,1])){
library(zoo)
} else {
install.packages("zoo",dependencies=TRUE)
library(zoo)
}
## Generating some nonsense data for demonstration
## 104 dates, 1 week apart
d1 <- as.Date("01/01/2012",format='%m/%d/%Y')
dSeq <- seq.Date(from=d1,
by='week',
length.out=104)
## Dependent variable
Y <- rnorm(104,50,10) + rnorm(104,10,1)*cos((1:104)/6)
## Independent variable for temperature
Temp <- rnorm(104,10,1) + cos((1:104)/12)
## Dummy variable for holidays (just picked a few off the calendar)
Holiday <- rep(0,104)
Holiday[c(3,3+52, 8,8+52, 22,22+52, 47,47+52, 52,52+52)] <- 1
Holiday <- ifelse(Holiday==0,"N","Y")
## Make a data.frame to hold variables
aDF <- data.frame(
Date=dSeq,
Y=Y,
Temp=Temp,
Holiday=Holiday)
## Make a time series version of this with the "zoo" function
## for using dynamic linear model.
zDF <- aDF
zDF[,2] <- zoo(aDF[,2],aDF[,1])
zDF[,3] <- zoo(aDF[,3],aDF[,1])
zDF[,4] <- zoo(aDF[,4],aDF[,1])
## A possible DLM... type ?dynlm for details of the function
dlm1 <- dynlm(Y ~ L(Y,1) + L(Y,13) + Temp + Holiday, data=zDF)
## Model summary
summary(dlm1)
## Estimated coefficients:
coefficients(dlm1)
Like I said, this is just one of many possibilities for analyzing multivariate time series in R; but to be honest, if you are "totally new to statistics" and not working on this particular project with someone who has experience with DLMs or similar models, I highly suggest reading through Forecasting: principles and practice by Rob Hyndman and George Athanasopoulos. It's a free online book written by two very knowledgeable econometricians and a significant amount of the content is geared towards people with little or no formal background in statistics / forecasting methods. Here's a link: https://www.otexts.org/fpp. On a related note, if you are going to be regularly working with time series data in R, I would suggest installing Hyndman's R package forecast, which is phenomenally useful. Additionally, your second question about deciding which independent variables have a more significant impact on sales is not something which can be succinctly answered. A typical modelling process involves a lot of steps related to diagnostic checking and evaluation of goodness-of-fit, and the tools for accomplishing such tasks can vary greatly depending on which type of statistical model you are using. Unfortunately, if you are brand new to statistics you will almost certainly have to invest a decent amount of time into understanding some important technical aspects of modelling, because there is much more to consider than the correlation of two variables, for example. This is another reason that I recommend reading through Hyndman and Athanasopoulos' online book, as it addresses a wide variety of fundamental aspects involved in the forecasting process.
A multiple-equation VAR model where contemporaneous dependent variables enter as regressors in other equations is a structural VAR (SVAR) model. When it comes to estimation of such models, there is a problem of simultaneity bias; a SVAR model cannot be estimated as it is using standard techniques. What is normally done is obtaining a reduced-form counterpart VAR model of the original SVAR model, estimating the former, and backing the original SVAR model up from the reduced-form estimates. This is pretty standard and there is quite some literature that should be available online. Perhaps Pfaff "Analysis of Integrated and Cointegrated Time Series with R" (starting at p. 43) could be useful.
Best Answer
You could add them as exogenous variable, or you can decompose the time series and analyze the seasonal component around Christmas. Maybe, even consider adding a dummy fixed effect for around the time you see seasonality.