Solved – Time Series Forecasting in R

accuracyarimaforecastingregressiontime series

I have a dataset composed by: Date, Cash, NumberOfAccountsperMonth. The frequency of the data is monthly.

I'd like to forecast Cash for the next 6 months with R, and so far I don't know which method is the best to go with.

On one hand, I just create a TimeSeries for Cash with the ts() formula, then proceed with the auto.arima() formula and get a forecast from ARIMA(0,1,1)(1,0,0)[12] since data has seasonality and trend.

On the other hand, I know that NumberofAccounts does influence Cash, so I've built a linear regression model for my time series with the tslm() formula and then I proceeded with the forecast.

The problem is that I'm getting very different results. Could anyone tell me which way to go?

Here's my code and the results

tsIncassi <- ts(Cash, start = c(2008,01), end=c(2017,10), frequency =12)
fit.arima <- auto.arima(tsCash)
summary(fit.arima)
Series: tsCash
ARIMA(0,1,1)(1,0,0)[12] with drift 

Coefficients:
          ma1    sar1     drift
      -0.7296  0.3983  7505.999
s.e.   0.0540  0.0910  2092.337

sigma^2 estimated as 2.804e+09:  log likelihood=-1438.54
AIC=2885.08   AICc=2885.44   BIC=2896.13

Training set error measures:
                    ME     RMSE     MAE       MPE     MAPE      MASE
Training set -237.2423 52052.09 36481.3 -55.44956 66.78608 0.4086746
                    ACF1
Training set -0.06202615

TS Regression code:

fit.tsreg <- tslm(tsCash ~ NumberAccounts + trend + season)
fcast.tsreg <- forecast(fit.tsreg, newdata = data.frame(NumberAccounts=NumberAccounts))
summary(fit.tsreg)

Call:
tslm(formula = tsCash ~ NumberAccounts + trend + season)

Residuals:
    Min      1Q  Median      3Q     Max 
-135019  -47778   -1129   41334  220754 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.376e+05  3.752e+04  -8.998 1.16e-14 ***
NumberAccou  4.482e-01  6.249e-02   7.171 1.12e-10 ***
trend        7.786e+03  2.682e+02  29.026  < 2e-16 ***
season2      3.626e+04  3.260e+04   1.112   0.2686    
season3      4.329e+04  3.241e+04   1.336   0.1845    
season4      3.826e+04  3.264e+04   1.172   0.2438    
season5      1.062e+04  3.243e+04   0.327   0.7440    
season6      4.519e+04  3.265e+04   1.384   0.1693    
season7      1.757e+04  3.242e+04   0.542   0.5889    
season8      1.634e+03  3.264e+04   0.050   0.9602    
season9      8.869e+03  3.243e+04   0.273   0.7850    
season10     5.904e+04  3.268e+04   1.806   0.0737 .  
season11    -4.469e+03  3.330e+04  -0.134   0.8935    
season12     8.474e+04  3.362e+04   2.520   0.0132 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 72430 on 104 degrees of freedom
Multiple R-squared:  0.9174,    Adjusted R-squared:  0.9071 
F-statistic: 88.84 on 13 and 104 DF,  p-value: < 2.2e-16

Here you can see the accuracy() results for both procedures

    accuracy(fcast.arima)
                       ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
    Training set 1886.528 48855.13 32553.28 -36.81754 55.17642 0.4766057 0.1147092

accuracy(fcast.tsreg)
                       ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
Training set 3.183231e-12 48007.12 37345.96 10.32393 130.0177 0.5467744 0.07845181

If the ARIMA forecast seems to be more accurate, why is that? Since I taking in consideration an independent variable that I know for sure that influences my dependent variable, shouldn't the forecast on the regression be more accurate?

Best Answer

accuracy() in your example only looks at in-sample accuracy. In-sample fit is a notoriously poor guide to out-of-sample accuracy.

Use a holdout sample instead: keep back the last $N$ observations, fit your models to the ones before that, forecast into the holdout sample and evaluate these forecasts.

In addition, you can feed external data into the xreg parameter for auto.arima(). This will fit a regression with ARIMA errors.

Related Solutions

Solved – Time series forecasting accuracy measures: MAPE and MASE

MASE compares the forecasts to those obtained from a naive method. The naive method turns out to be very poor for white noise, but not so bad for an AR(1) with $\phi=0.7$. Consequently, the forecasts for the AR have a worse MASE than the forecasts for the white noise.

We can make this more precise as follows.

Let $y_1,y_2,\dots,y_{T}$ be a non-seasonal time series process observed to time $T$. Then MASE is defined as $$ \text{MASE} = \frac{1}{K}\sum_{k=1}^K |y_{T+k} - \hat{y}_{T+k|T}| / Q $$ where $Q$ is a scaling factor equal to the in-sample one-step naive forecast error, $$ Q = \frac{1}{T-1} \sum_{t=2}^T |y_t-y_{t-1}|, $$ and $\hat{y}_{T+k|T}$ is an estimate of $y_{T+k}$ given the observations $y_1,\dots,y_T$.

MASE provides a measure of how accurate forecasts are for a given series and the $Q$ scaling is intended to allow comparisons between series of different scales.

Suppose $y_t$ is standard Gaussian white noise $N(0,1)$. Then the data has variance 1, and the optimal forecast is $\hat{y}_{T+k|T}=0$ with forecast variance $v_{T+k|T} = 1$. Therefore $\text{E}|y_{T+k} - \hat{y}_{T+k|T}| = \sqrt{2/\pi}$ and $y_t-y_{t-1}\sim N(0,2)$. Thus the scaling factor has mean $\text{E}(Q) = 2/\sqrt{\pi}$, so that MASE has asymptotic mean $1/\sqrt{2}\approx 0.707$ (as $T\rightarrow\infty$). Note also that the long-term forecast variance $v_{T+\infty|T}=1$ is less than the in-sample naive forecast variance of 2.

But suppose $y_t$ is an AR(1) process defined as $y_t = \phi y_{t-1} + e_t$ where $e_t$ is Gaussian white noise $N(0,\sigma^2)$. Then the data has variance $\sigma^2/(1-\phi^2)$, and optimal forecast is $\hat{y}_{T+k|T} = \phi^k y_{T}$ with variance $v_{T+k|T} = \sigma^2(1-\phi^{2k})/(1-\phi^2)$. Therefore $\text{E}|y_{T+k} - \hat{y}_{T+k|T}| = \sigma\sqrt{2(1-\phi^{2k})/[(1-\phi^2)\pi]}$ and $y_t-y_{t-1} \sim N(0, 2\sigma^2/(1+\phi))$. Thus the scaling factor has mean $\text{E}(Q) = 2\sigma/\sqrt{\pi(1+\phi)}$.

For large $k$, if $\sigma^2 = 1-\phi^2$ then $v_{T+k|T} \approx 1$, $\text{E}(Q) \approx 2\sqrt{(1-\phi)/\pi}\}$ and $\text{E}|y_{T+k} - \hat{y}_{T+k|T}| \approx \sqrt{2/\pi}$. So the asymptotic MASE (as $K\rightarrow\infty$ and $T\rightarrow\infty$) has mean of $$1 / \sqrt{2(1-\phi)}$$ which is approximately 1.29 for $\phi=0.7$.

Solved – Timeseries analysis procedure and methods using R

You should use the forecast package, which supports all of these models (and more) and makes fitting them a snap:

library(forecast)
x <- AirPassengers
mod_arima <- auto.arima(x, ic='aicc', stepwise=FALSE)
mod_exponential <- ets(x, ic='aicc', restrict=FALSE)
mod_neural <- nnetar(x, p=12, size=25)
mod_tbats <- tbats(x, ic='aicc', seasonal.periods=12)
par(mfrow=c(4, 1))
plot(forecast(mod_arima, 12), include=36)
plot(forecast(mod_exponential, 12), include=36)
plot(forecast(mod_neural, 12), include=36)
plot(forecast(mod_tbats, 12), include=36)

I would advise against smoothing the data prior to fitting your model. Your model is inherently going to try to smooth the data, so pre-smoothing just complicates things.

enter image description here

Edit based on new data:

It actually looks like arima is one of the worst models you could chose for this training and test set.

I saved your data to a file call coil.csv, loaded it into R, and split it into a training and test set:

library(forecast)
dat <- read.csv('~/coil.csv')
x <- ts(dat$Coil, start=c(dat$Year[1], dat$Month[1]), frequency=12)
test_x <- window(x, start=c(2012, 3))
x <- window(x, end=c(2012, 2))

Next I fit a bunch of time series models: arima, exponential smoothing, neural network, tbats, bats, seasonal decomposition, and structural time series:

models <- list(
  mod_arima = auto.arima(x, ic='aicc', stepwise=FALSE),
  mod_exp = ets(x, ic='aicc', restrict=FALSE),
  mod_neural = nnetar(x, p=12, size=25),
  mod_tbats = tbats(x, ic='aicc', seasonal.periods=12),
  mod_bats = bats(x, ic='aicc', seasonal.periods=12),
  mod_stl = stlm(x, s.window=12, ic='aicc', robust=TRUE, method='ets'),
  mod_sts = StructTS(x)
  )

Then I made some forecasts and compared to the test set. I included a naive forecast that always predicts a flat, horizontal line:

forecasts <- lapply(models, forecast, 12)
forecasts$naive <- naive(x, 12)
par(mfrow=c(4, 2))
for(f in forecasts){
  plot(f)
  lines(test_x, col='red')
}

enter image description here

As you can see, the arima model gets the trend wrong, but I kind of like the look of the "Basic Structural Model"

Finally, I measured each model's accuracy on the test set:

acc <- lapply(forecasts, function(f){
  accuracy(f, test_x)[2,,drop=FALSE]
})
acc <- Reduce(rbind, acc)
row.names(acc) <- names(forecasts)
acc <- acc[order(acc[,'MASE']),]
round(acc, 2)
                ME    RMSE     MAE   MPE MAPE MASE ACF1 Theil's U
mod_sts     283.15  609.04  514.46  0.69 1.27 0.10 0.77      1.65
mod_bats     65.36  706.93  638.31  0.13 1.59 0.12 0.85      1.96
mod_tbats    65.22  706.92  638.32  0.13 1.59 0.12 0.85      1.96
mod_exp      25.00  706.52  641.67  0.03 1.60 0.12 0.85      1.96
naive        25.00  706.52  641.67  0.03 1.60 0.12 0.85      1.96
mod_neural   81.14  853.86  754.61  0.18 1.89 0.14 0.14      2.39
mod_arima   766.51  904.06  766.51  1.90 1.90 0.14 0.73      2.48
mod_stl    -208.74 1166.84 1005.81 -0.52 2.50 0.19 0.32      3.02

The metrics used are described in Hyndman, R.J. and Athanasopoulos, G. (2014) "Forecasting: principles and practice", who also happen to be the authors of the forecast package. I highly recommend you read their text: it's available for free online. The structural time series is the best model by several metrics, including MASE, which is the metric I tend to prefer for model selection.

One final question is: did the structural model get lucky on this test set? One way to assess this is looking at training set errors. Training set errors are less reliable than test set errors (because they can be over-fit), but in this case the structural model still comes out on top:

acc <- lapply(forecasts, function(f){
  accuracy(f, test_x)[1,,drop=FALSE]
})
acc <- Reduce(rbind, acc)
row.names(acc) <- names(forecasts)
acc <- acc[order(acc[,'MASE']),]
round(acc, 2)
                ME    RMSE     MAE   MPE MAPE MASE  ACF1 Theil's U
mod_sts      -0.03    0.99    0.71  0.00 0.00 0.00  0.08        NA
mod_neural    3.00 1145.91  839.15 -0.09 2.25 0.16  0.00        NA
mod_exp     -82.74 1915.75 1359.87 -0.33 3.68 0.25  0.06        NA
naive       -86.96 1936.38 1386.96 -0.34 3.75 0.26  0.06        NA
mod_arima  -180.32 1889.56 1393.94 -0.74 3.79 0.26  0.09        NA
mod_stl     -38.12 2158.25 1471.63 -0.22 4.00 0.28 -0.09        NA
mod_bats     57.07 2184.16 1525.28  0.00 4.07 0.29 -0.03        NA
mod_tbats    62.30 2203.54 1531.48  0.01 4.08 0.29 -0.03        NA

(Note that the neural network overfit, performing excellent on the training set and poorly on the test set)

Finally, it would be a good idea to cross-validate all of these models, perhaps by training on 2008-2009/testing on 2010, training on 2008-2010/testing on 2011, training on 2008-2011/testing on 2012, training on 2008-2012/testing on 2013, and averaging errors across all of these time periods. If you wish to go down that route, I have a partially complete package for cross-validating time series models on github that I'd love you to try out and give me feedback/pull requests on:

devtools::install_github('zachmayer/cv.ts')
library(cv.ts)

Edit 2: Lets see if I remember how to use my own package!

First of all, install and load the package from github (see above). Then cross-validate some models (using the full dataset):

library(cv.ts)
x <- ts(dat$Coil, start=c(dat$Year[1], dat$Month[1]), frequency=12)
ctrl <- tseriesControl(stepSize=1, maxHorizon=12, minObs=36, fixedWindow=TRUE)
models <- list()

models$arima = cv.ts(
  x, auto.arimaForecast, tsControl=ctrl,
  ic='aicc', stepwise=FALSE)

models$exp = cv.ts(
  x, etsForecast, tsControl=ctrl,
  ic='aicc', restrict=FALSE)

models$neural = cv.ts(
  x, nnetarForecast, tsControl=ctrl,
  nn_p=6, size=5)

models$tbats = cv.ts(
  x, tbatsForecast, tsControl=ctrl,
  seasonal.periods=12)

models$bats = cv.ts(
  x, batsForecast, tsControl=ctrl,
  seasonal.periods=12)

models$stl = cv.ts(
  x, stl.Forecast, tsControl=ctrl,
  s.window=12, ic='aicc', robust=TRUE, method='ets')

models$sts = cv.ts(x, stsForecast, tsControl=ctrl)

models$naive = cv.ts(x, naiveForecast, tsControl=ctrl)

models$theta = cv.ts(x, thetaForecast, tsControl=ctrl)

(Note that I reduced the flexibility of the neural network model, to try to help prevent it from overfitting)

Once we've fit the models, we can compare them by MAPE (cv.ts doesn't yet support MASE):

res_overall <- lapply(models, function(x) x$results[13,-1])
res_overall <- Reduce(rbind, res_overall)
row.names(res_overall) <- names(models)
res_overall <- res_overall[order(res_overall[,'MAPE']),]
round(res_overall, 2)
                 ME    RMSE     MAE   MPE MAPE
naive     91.40 1126.83  961.18  0.19 2.40
ets       91.56 1127.09  961.35  0.19 2.40
stl     -114.59 1661.73 1332.73 -0.29 3.36
neural     5.26 1979.83 1521.83  0.00 3.83
bats     294.01 2087.99 1725.14  0.70 4.32
sts     -698.90 3680.71 1901.78 -1.81 4.77
arima  -1687.27 2750.49 2199.53 -4.23 5.53
tbats   -476.67 2761.44 2428.34 -1.23 6.10

Ouch. It would appear that our structural forecast got lucky. Over the long term, the naive forecast makes the best forecasts, averaged across a 12-month horizon (the arima model is still one of the worst models). Let's compare the models at each of the 12 forecast horizons, and see if any of them ever beat the naive model:

library(reshape2)
library(ggplot2)
res <- lapply(models, function(x) x$results$MAPE[1:12])
res <- data.frame(do.call(cbind, res))
res$horizon <- 1:nrow(res)
res <- melt(res, id.var='horizon', variable.name='model', value.name='MAPE')
res$model <- factor(res$model, levels=row.names(res_overall))
ggplot(res, aes(x=horizon, y=MAPE, col=model)) +
  geom_line(size=2) + theme_bw() +
  theme(legend.position="top") +
  scale_color_manual(values=c(
    "#1f78b4", "#ff7f00", "#33a02c", "#6a3d9a",
    "#e31a1c", "#b15928", "#a6cee3", "#fdbf6f",
    "#b2df8a")
    )

model compare

Tellingly, the exponential smoothing model is always picking the naive model (the orange line and blue line overlap 100%). In other words, the naive forecast of "next month's coil prices will be the same as this month's coil prices" is more accurate (at almost every forecast horizon) than 7 extremely sophisticated time series models. Unless you have some secret information the coil market doesn't already know, beating the naive coil price forecast is going to be extremely difficult.

It's never the answer anyone wants to hear, but if forecast accuracy is your goal, you should use the most accurate model. Use the naive model.

Best Answer

Related Solutions

Solved – Time series forecasting accuracy measures: MAPE and MASE

Solved – Timeseries analysis procedure and methods using R

Related Question