R ARIMA Models – Definition and Usage of ARIMA Models with Exogenous Regressors in R

arimar

I am wondering about the exact definition of ARIMA model in function arima in R when exogenous regressors are included.

I understand that arima(y, order=c(p,0,q), xreg=x) is equivalent to estimating the following equation (where $\mu_y$ and $\mu_x$ stand for the means of $y$ and $x$, respectively):

(1) $(y_t-\mu_y)=\varphi_0+\phi_1(y_{t-1}-\mu_y)+…+\varphi_p(y_{t-p}-\mu_y)+\varepsilon_t+\theta_1\varepsilon_{t-1}+…+\theta_q\varepsilon_{t-q}+\beta_1x_t$

Or is it

(2) $(y_t-\mu_y)=\varphi_0+\phi_1(y_{t-1}-\mu_y)+…+\varphi_p(y_{t-p}-\mu_y)+\varepsilon_t+\theta_1\varepsilon_{t-1}+…+\theta_q\varepsilon_{t-q}+\beta_1(x_t-\mu_x)$

(only the last term differs between (1) and (2))?

Or perhaps I got both of them wrong?

Edit: I now realize that including both {$\mu_x$ and $\mu_y$} and $\varphi_0$ in (2) was superfluous.

Best Answer

Looking at the documentation and the code of arima, I conclude that the following linear model with ARIMA errors is fitted when exogenous regressors are included:

\begin{eqnarray} \begin{array}{l} (y_t-\mu-X_t^{'}\vec{\beta})&=&\phi_1(y_{t-1}-\mu-X_{t-1}^{'}\vec{\beta})+...+\phi_p(y_{t-p}-\mu-X_{t-p}^{'}\vec{\beta}) \\ &+&\varepsilon_t+\theta_1\varepsilon_{t-1}+...+ \theta_q\varepsilon_{t-q} \,. \end{array} \end{eqnarray}

$X_t^{'}$ is a row vector containing the values of the external regressors at time $t$ and $\vec{\beta}$ is a column vector containing the coefficients related to those regressors.

Thus, there is no $\varphi_0$ term and the mean $\mu_x$ is not removed from the exogenous regressors.

Related Solutions

Solved – Jack-knife with time series models

I don't understand why you think $q>0$ is a problem for prediction. It is easy enough to forecast using an ARIMA model with MA terms and you don't need to use the innovations algorithm of Brockwell and Davis. That algorithm is useful for estimation; in particular, in getting starting values when optimizing the likelihood.

To answer your specific questions:

auto.arima() calls arima() which uses a state space representation for computing the likelihood. Missing values are handled naturally in a state space format. So, yes, they are handled correctly.
Missing historical values are not estimated by arima(). If you want to forecast them (i.e., using only past data), just fit a model up to the start of the missing sequence and then forecast from it. If you want to estimate them (using data before and afterwards), you would need to use a Kalman smoother based on the equivalent state space model. An alternative fudge that gives almost the same results is to average the forecasts using data up to the last non-missing data with the backcasts using data up to the first non-missing data after the missing sequence.

R – How to Model Multivariate Time Series Count Data with Poisson Regression in R

I found a reference which answered my question: https://arxiv.org/pdf/1405.3738.pdf.

The model is quite complicated, here is the state space representation:

So, let's say I have L different products I'm studying across 1,..,T time periods.

$Y_{l,t} \sim z*\delta_0 + (1-z)NB(exp(\widetilde{\eta}_{l,t}),alpha_l)$ is the distribution for product l at time t

$\widetilde{\eta}_{l,t} = \eta_{l,t} + X_{l,t}\theta_l$ this is the Log of the mean of product l sales at time t, guaranteeing that it is positive.

$\eta_{l,t} = \mu_l + \phi_l(\eta_{l,t-1}-\mu) + \epsilon_{l,t}$

$\epsilon_{l,t} \sim N(0,\frac{1}{\tau_l})$

The other priors and hyperpriors are in the next images:

P.S. Now I'm trying to write the JAGS code and any help would be much appreciated! ( https://stackoverflow.com/questions/40528715/runtime-error-in-jags )

Edit:

Here is the JAGS code:

model{


#hyperpriors 4
alpha_star ~ dunif(0.001,0.1)
tau_mu_star ~ dunif(1,10)
mu_star ~ dnorm(0,0.5)
beta_tau ~ dunif(2,25)
beta_0_tau ~ dunif(1,10)
beta_theta ~ dunif(2,25)
phiminus ~ dunif(1,50)
k_tau ~ dunif(5,10)
k_0_tau ~ dunif(1,5)
pointmass_0 ~ dnorm(0,10000)
k_theta ~ dunif(5,10)
phiplus ~ dunif(1,600)
theta_star ~ dmnorm(b0,B0)
#17
for (l in 1:L){
z[l] ~ dbeta(0.5,0.5)
phi[l] ~ dbeta(phiplus + phiminus, phiminus)
tau[l] ~ dgamma(k_tau,beta_tau)
tau_theta[l] ~ dgamma(k_tau,beta_tau)
mu[l] ~ dnorm(mu_star, tau_mu_star)
alpha[l] ~ dexp(alpha_star)
eps[1,l] ~ dnorm(0,tau[l])
eta[1,l] = mu_star + eps[1,l]
theta[l,1:8] ~ dmnorm(theta_star,thetavar*tau_theta[l])
#y[1,l] ~ inprod(1-z[l],dnegbin(exp(eta[1,l]),alpha[l]))
y[1,l] ~ dnegbin(exp(eta[1,l]),alpha[l])

#y[1,l] ~ dnegbin(exp(eta[1,l]),alpha[l])
ystar[1,l] ~ dnorm(z[l]*pointmass_0 + inprod((1-z[l]),y[1,l]),100000)
}

for (i in 2:N){

for (l in 1:L){
eps[i,l] ~ dnorm(0,tau[l])
}

    for(l in 1:L){
    eta[i,l] = mu[l]+ phi[l]*(eta[i-1,l]-mu[l]) + eps[i,l] 
    eta_star[i,l]= eta[i,l] + inprod(c(x[i,l],xshared[i,]),t(theta[l,]))
    #observations
#kobe[i,l] ~ dnegbin(dexp(eta_star[i,l]),alpha[l])
#   #y[i,l] = inprod(1-z[l],kobe[i,l])
    #y[i,l] ~ inprod(1-z[l],dnegbin(exp(eta_star[i,l]),alpha[l]))
    #y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
    y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
    ystar[i,l] ~ dnorm(z[l]*pointmass_0 + inprod((1-z[l]),y[i,l]),100000)
}
}

}

Which I call from R using runjags:

 parsamples <- run.jags('jags_model.txt', data=forJags, monitor=c('y','theta'), sample=100, method='rjparallel')

Best Answer

Related Solutions

Solved – Jack-knife with time series models

R – How to Model Multivariate Time Series Count Data with Poisson Regression in R

Related Question