Solved – Multivariate time series model vs. Univariate time series model with categorical variable

categorical dataforecastingtime series

Let's say I want to forecast sales of several products in a grocery store, which are green apple, red apple, blue berry, strawberry.

My question is, how do I know if I should model this as a multivariate time series (like VAR) or I should go with an univariate time series model with an extra categorical variable indicating the 4 types of fruit?

Best Answer

With sufficiently fancy grouping of the data by the categorical variable, and allowing the error terms the right structure, the two methods could be made to be identical (I think). But it's certainly much easier to think about and to implement as a multivariate time series.

Related Solutions

Forecastability – How to Assess the Forecastability of Time Series Data

Here's a second idea based on stl.

You could fit an stl decomposition to each series, and then compare the standard error of the remainder component to the mean of the original data ignoring any partial years. Series that are easy to forecast should have a small ratio of se(remainder) to mean(data).

The reason I suggest ignoring partial years is that seasonality will affect the mean of the data otherwise. In the example in the question, all series have seven complete years, so it is not an issue. But if the series extended part way into 2012, I suggest the mean is computed only up to the end of 2011 to avoid seasonal contamination of the mean.

This idea assumes that mean(data) makes sense -- that is that the data are mean stationary (apart from seasonality). It probably wouldn't work well for data with strong trends or unit roots.

It also assumes that a good stl fit translates into good forecasts, but I can't think of an example where that wouldn't be true so it is probably an ok assumption.

R – How to Model Multivariate Time Series Count Data with Poisson Regression in R

I found a reference which answered my question: https://arxiv.org/pdf/1405.3738.pdf.

The model is quite complicated, here is the state space representation:

So, let's say I have L different products I'm studying across 1,..,T time periods.

$Y_{l,t} \sim z*\delta_0 + (1-z)NB(exp(\widetilde{\eta}_{l,t}),alpha_l)$ is the distribution for product l at time t

$\widetilde{\eta}_{l,t} = \eta_{l,t} + X_{l,t}\theta_l$ this is the Log of the mean of product l sales at time t, guaranteeing that it is positive.

$\eta_{l,t} = \mu_l + \phi_l(\eta_{l,t-1}-\mu) + \epsilon_{l,t}$

$\epsilon_{l,t} \sim N(0,\frac{1}{\tau_l})$

The other priors and hyperpriors are in the next images:

P.S. Now I'm trying to write the JAGS code and any help would be much appreciated! ( https://stackoverflow.com/questions/40528715/runtime-error-in-jags )

Edit:

Here is the JAGS code:

model{


#hyperpriors 4
alpha_star ~ dunif(0.001,0.1)
tau_mu_star ~ dunif(1,10)
mu_star ~ dnorm(0,0.5)
beta_tau ~ dunif(2,25)
beta_0_tau ~ dunif(1,10)
beta_theta ~ dunif(2,25)
phiminus ~ dunif(1,50)
k_tau ~ dunif(5,10)
k_0_tau ~ dunif(1,5)
pointmass_0 ~ dnorm(0,10000)
k_theta ~ dunif(5,10)
phiplus ~ dunif(1,600)
theta_star ~ dmnorm(b0,B0)
#17
for (l in 1:L){
z[l] ~ dbeta(0.5,0.5)
phi[l] ~ dbeta(phiplus + phiminus, phiminus)
tau[l] ~ dgamma(k_tau,beta_tau)
tau_theta[l] ~ dgamma(k_tau,beta_tau)
mu[l] ~ dnorm(mu_star, tau_mu_star)
alpha[l] ~ dexp(alpha_star)
eps[1,l] ~ dnorm(0,tau[l])
eta[1,l] = mu_star + eps[1,l]
theta[l,1:8] ~ dmnorm(theta_star,thetavar*tau_theta[l])
#y[1,l] ~ inprod(1-z[l],dnegbin(exp(eta[1,l]),alpha[l]))
y[1,l] ~ dnegbin(exp(eta[1,l]),alpha[l])

#y[1,l] ~ dnegbin(exp(eta[1,l]),alpha[l])
ystar[1,l] ~ dnorm(z[l]*pointmass_0 + inprod((1-z[l]),y[1,l]),100000)
}

for (i in 2:N){

for (l in 1:L){
eps[i,l] ~ dnorm(0,tau[l])
}

    for(l in 1:L){
    eta[i,l] = mu[l]+ phi[l]*(eta[i-1,l]-mu[l]) + eps[i,l] 
    eta_star[i,l]= eta[i,l] + inprod(c(x[i,l],xshared[i,]),t(theta[l,]))
    #observations
#kobe[i,l] ~ dnegbin(dexp(eta_star[i,l]),alpha[l])
#   #y[i,l] = inprod(1-z[l],kobe[i,l])
    #y[i,l] ~ inprod(1-z[l],dnegbin(exp(eta_star[i,l]),alpha[l]))
    #y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
    y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
    ystar[i,l] ~ dnorm(z[l]*pointmass_0 + inprod((1-z[l]),y[i,l]),100000)
}
}

}

Which I call from R using runjags:

 parsamples <- run.jags('jags_model.txt', data=forJags, monitor=c('y','theta'), sample=100, method='rjparallel')

Best Answer

Related Solutions

Forecastability – How to Assess the Forecastability of Time Series Data

R – How to Model Multivariate Time Series Count Data with Poisson Regression in R

Related Question