Solved – Auto.Arima using xreg giving xreg as insignificant

arimaforecasting

I am trying to forecast volume using ARIMA. I also have a independent variable that I can use in forecasting. When I do a normal linear Regression the Independent variable is very significant. But when I am doing an auto.arima model I got the following model:

ARIMA(3,1,3)
      Estimate Std. Error  z value Pr(>|z|)    
ar1   0.453844   0.043957  10.3248  < 2e-16 ***

ar2  -0.081913   0.046282  -1.7699  0.07675 .  

ar3  -0.510220   0.043070 -11.8463  < 2e-16 ***

ma1  -0.650601   0.033088 -19.6625  < 2e-16 ***

ma2  -0.654464   0.035819 -18.2714  < 2e-16 ***

ma3   0.664126   0.028016  23.7055  < 2e-16 ***

xreg  0.262486   0.355874   0.7376  0.46077

As you can see the independent variable(xreg) is insignificant here. How do I interpret this?

Best Answer

Recall that auto.arima() fits a regression with ARIMA errors.

The key issue is the treatment of the intercept. lm() will always fit an intercept, but auto.arima(..., allowmean=TRUE) may do so (or remove it - there apparently is no way to force auto.arima() to include the intercept, unless you run lm() first, then call auto.arima() on the residuals).

Some playing around will give you different possible combinations (I'm truncating the output):

> set.seed(1)
> foo <- ts(rnorm(100))
> bar <- rnorm(100)
> summary(lm(foo~bar))

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.1088521  0.0903480   1.205    0.231
bar         -0.0009324  0.0947216  -0.010    0.992

> summary(auto.arima(foo,xreg=bar))

Regression with ARIMA(0,0,0) errors 

Coefficients:
         xreg
      -0.0055
s.e.   0.0944

This is your situation: lm () fits an intercept, auto.arima() doesn't, so the coefficients (and the standard errors, and p values) don't match. You can force these to match by excluding the intercept from the lm() model:

> summary(lm(foo~bar-1))

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
bar -0.005456   0.094863  -0.058    0.954

Adding an offset induces auto.arima() to include the intercept:

> summary(lm(foo+1~bar))

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.1088521  0.0903480   12.27   <2e-16 ***
bar         -0.0009324  0.0947216   -0.01    0.992    

> summary(auto.arima(foo+1,xreg=bar))

Regression with ARIMA(0,0,0) errors 

Coefficients:
      intercept     xreg
         1.1089  -0.0009
s.e.     0.0894   0.0938

And the coefficients and (mostly) the standard errors match again.

Related Solutions

Solved – Issues in auto.arima algorithm when using external regressors and outlier correction

The issue is most likely related to the scale of the data. The data take on large values and this may lead to some numerical problems. This is what I get after dividing the data by 10,000 and using the BIC criterion.

require(forecast)
require(tsoutliers)
datats <- structure(c(2948440.74, 7686353.79, 6911493.29, 14735151.86, 23381406.69, 20227482.18, 18903360.7, 15833633.89, 13124710.08, 
  10021578.4, 10543615.37, 5101192.99, 4298189.2, 4003772.82, 5575776.76, 21282851.11, 22256756.35, 33951754.02, 25523787.6, 19104589.15, 
  16589269.34, 22852357.49, 16193397.83, 12323810.99, 10452747.36, 4987688.69, 8046769.04, 29435984.88, 24839254.81, 29105406.04, 
  31800705.19, 13817643.19, 13814943.1, 26481714.12, 17992971.92, 9399541.88, 6543970.31, 5310201.59, 7570275.84, 25988976.96, 
  30725584.28, 27390701.17, 25820703.75, 17954952.49, 15088122.36, 15702960.91, 13609359.16, 7369414.86, 4775041.56, 4034859.17, 
  17040576.77, 16826494.46, 22406996.07, 24773844.66, 31287138.55, 23570837.57, 16026064.28, 15197182.57, 15509137.94, 10241087.07, 
  3493348.32, 1730346.2, 17251774.11, 20148964.98, 19003322.88, 29760265.93, 25481326.36, 25830736.11, 14148186.89, 14368004.55, 
  14952315.95, 6886217.54, 7295155.27, 5944267.08, 14806881.21, 25824636.63, 26767998.82, 38040351.59, 29168699.84, 29433400.4, 
  20307430.63, 16010131.61, 12446879.82, 4818262.85, 3745592.95, 8864976.31, 12814904.34, 30240664.47, 34272201.49, 31058565.62, 
  18227566.3, 17786236.84, 21470722, 16643487.78, 11593014.72, 7215432.3, 3041258.29, 4808895.81, 13848873.64, 20426958.31, 
  51695463.16, 38293297.51, 25750299.14, 19245949.8, 14828108.86, 15736758.69, 11288184.22, 8509989.64, 6560242.4, 4625940.53, 
  22368905.29, 34677860.01, 38790083.19, 33567845.48, 39365091.23, 26318906.99, 20814308.33, 16424344.98, 12170515.08, 7406164.25, 
  4780528.38, 6320645.22, 13635055.52, 39512697.88, 41871133.48, 34342300.49, 27125796.06, 16253723.1, 20046817.98, 13785445.02, 
  14315921.47, 7451878.2, 5707949.58, 3884183.86, 11962756.46), .Tsp = c(2003.83333333333, 2015, 12), class = "ts")
out.ind <- outliers(c("AO"), c(101))
out.df <- outliers.effects(out.ind, 135)
x <- datats/100000
auto.arima(x, ic = "bic", xreg=out.df)
# ARIMA(1,0,0)(2,0,0)[12] with non-zero mean 
# Coefficients:
#          ar1    sar1    sar2  intercept     AO101
#       0.3900  0.6898  0.1101   168.5335  179.8997
# s.e.  0.0925  0.0910  0.0995    25.2476   37.3177
# sigma^2 estimated as 2328:  log likelihood=-720.61
# AIC=1453.22   AICc=1453.88   BIC=1470.66

The chosen model contains two seasonal AR coefficients that capture cycles related to the fundamental seasonal frequency and some of its harmonics.

Note: I don't claim the above is the best model for the data, it was just intended to illustrate a possible problem with the scale of the data.

Edit

Be also aware that by default for series with more than 100 observations, estimation is done by conditional sums of squares and the information criteria used for model selection are approximated (by default approximation=TRUE). This issue along with the large scale of the data may be troublesome in this case.

The plot below shows the forecasts for each chosen model respectively for the following cases:

the original scale and approximation=TRUE is used (as in your example),
the original scale and approximation=FALSE,
the original series divided by 10,000 and approximation=TRUE.

Either rescaling the series or using approximation=FALSE yields graphically a sensible result compared to the original scale and the default options.

fit1 <- auto.arima(datats, ic = "bic", xreg=out.df, approximation=TRUE)
fit2 <- auto.arima(datats, ic = "bic", xreg=out.df, approximation=FALSE)
fit3 <- auto.arima(datats/10000, ic = "bic", xreg=out.df, approximation=TRUE)
par(mfrow=c(3,1), mar=c(2,2,2,2))
plot(forecast(fit1, 24, xreg=rep(0,24)))
plot(forecast(fit2, 24, xreg=rep(0,24)))
plot(forecast(fit3, 24, xreg=rep(0,24)))

comparison of the three models

Solved – time series forecasting using auto.arima and exponential smoothing

Seasonality is probably not very strong. Different algorithms will give different results, unless seasonality is glaringly obvious.
The best measure is always to compare forecast accuracy on a holdout set: hold back the last $n$ observations, fit your models to all other observations, forecast into the last $n$ time periods with both models, then compare forecast accuracy using your error measure of choice (see 5 below).
Yes, this is a common complaint. I don't think there is an easy way to get the in-sample fit. But you can get the residuals: auto.arima(WWWusage)$residuals. Best to look into the code of auto.arima() to see whether you need to add or subtract them from the original series to get the fit. I'd say you have to subtract ("actuals=model+residuals"), but better check.
I recommend a good forecasting textbook. This is a very good start. Otherwise, read through the help pages.
The appropriate error measure will depend on your personal loss function. Is your pain symmetric, and will it increase more strongly with larger errors? Then use MSE. Is your pain proportional to absolute errors? Then use MAE. Best to look at multiple error measures.

One tip: averaging forecasts will usually improve accuracy. Consider taking the average of your two models' forecasts per future time bucket.
auto.arima() apparently fits no drift, even if you allow it.

Best Answer

Related Solutions

Solved – Issues in auto.arima algorithm when using external regressors and outlier correction

Solved – time series forecasting using auto.arima and exponential smoothing

Related Question