Forecasting airline passengers seasonal time series using auto.arima()
I am trying to model some airline data in an attempt to provide an accurate monthly forecast for June-December this year using monthly data from January 2003 onwards. The data is taken from: http://www.transtats.bts.gov/Data_Elements.aspx?Data=1
Here is the time series plot and ACF
I have used auto.arima to develop two models and checked that they correspond to the autocorrelation functions. Basically I am having trouble deciding whether to use:
- The following seasonal ARIMA model
-
The following non-seasonal ARIMA model of $N_t$ after I first decomposed the model into a trend, seasonal component and random component $X_t = T_t +S_t +N_t $ using a 12-point moving average (basically did the same thing as the
decompose()
function manually)
I have analysed the important properties of both models such as ensuring residuals are close to a white noise process and so on but am unsure which of the above 2 models is most suitable for forecasting purposes and why?
Also I am unsure how to compute forecast for the trend component vector if I use the classical decomposition model $X_t = T_t + S_t +N_t$. Is it even possible to create forecasts using this type of model?
Edit:
Here is the output of dput(IAP)
(the raw data without trend or seasonal component removed)
dput(IAP)
structure(c(9726436L, 8283372L, 9538653L, 8309305L, 8801873L,
10347900L, 11705206L, 11799672L, 9454647L, 9608358L, 9481886L,
10512547L, 10252443L, 9310317L, 10976440L, 10802022L, 10971254L,
12159514L, 13502913L, 13203566L, 10570682L, 10772177L, 10174320L,
11244427L, 11387275L, 9945067L, 12479643L, 11521174L, 12164600L,
13140061L, 14421209L, 13703334L, 11325800L, 11107586L, 10580099L,
11812574L, 11724098L, 10167275L, 12707241L, 12619137L, 12610793L,
13690835L, 14912621L, 14171796L, 12010922L, 11517228L, 11222687L,
12385958L, 12072442L, 10590281L, 13246293L, 12795517L, 12978086L,
14170877L, 15470687L, 15120200L, 12321953L, 12381689L, 12004268L,
13098697L, 12767516L, 11648482L, 14194753L, 12961165L, 13602014L,
14413771L, 15449821L, 15327739L, 11731364L, 11921490L, 11256163L,
12463351L, 12075267L, 10412676L, 12508793L, 12629805L, 11806548L,
13199636L, 14953615L, 14844821L, 11659775L, 11905529L, 11093714L,
12659154L, 12393439L, 10694165L, 13279320L, 12398700L, 13380664L,
14406776L, 16026852L, 15317926L, 12599149L, 12874707L, 11651314L,
12915663L, 12668763L, 10944610L, 13473705L, 13537152L, 13935132L,
14814672L, 16623674L, 15753387L, 13220884L, 13185627L, 12144742L,
13546071L, 13206682L, 11732944L, 14387677L, 13995377L, 14291285L,
15582335L, 16969590L, 16621336L, 13791714L, 13397785L, 12762536L,
14096567L, 13766673L, 12023339L, 15177069L, 14278932L, 15306328L,
16232176L, 17645538L, 17517022L, 14239561L, 14209627L, 13133257L,
15083929L, 14589637L, 12385546L, 15486317L, 14857685L, 15615732L
), .Tsp = c(2003, 2014.33333333333, 12), class = "ts")
Here is the output of dput(IAP.res)
(the random component from the decomposition)
dput(IAP.res)
structure(c(NA, NA, NA, NA, NA, NA, -669127.347569446, -168943.285069446,
225871.456597222, 271337.106597223, 711896.11076389, 284583.435763889,
165401.360763887, 622993.194097221, -268299.21423611, -9406.73506944434,
-233904.910069446, -147124.755902779, -260973.055902776, -163628.243402778,
-43056.7100694457, 121365.814930555, 205106.485763889, -107464.272569445,
247575.569097221, 279399.444097225, 309270.160763888, -166333.068402778,
129823.798263889, 22571.1190972265, -113455.59756944, -384199.160069444,
62061.8315972222, -155858.226736111, 13600.0274305546, -87564.1475694429,
71845.7357638887, 8145.86076388881, 47627.494097226, 442212.72326389,
73639.5065972234, 60882.5774305568, -135204.389236112, -437744.576736112,
203832.581597222, -264145.435069444, 179945.61076389, 15812.1024305553,
-49648.0975694434, -61460.8059027772, 89656.3690972241, 118205.931597224,
-84196.4517361106, 4197.78576389072, -134118.722569442, -87234.4517361117,
-126555.418402776, -57714.9350694417, 293250.152430556, 59462.6857638892,
10340.8190972245, 416646.652430557, 526459.702430556, -135041.068402776,
239767.631597222, 67034.9940972247, -221066.180902774, 207611.839930556,
-424486.00173611, -94779.3517361115, 89796.4857638886, 130285.644097223,
104776.152430555, 16099.8607638888, -317097.047569448, 335867.264930556,
-796342.285069446, -446777.464236111, -93681.7225694442, 242962.798263888,
-143380.293402778, 135423.439930556, 28934.7357638923, 186390.185763891,
116969.777430558, -113617.264236109, -39733.9225694438, -471572.526736109,
130389.423263891, 80446.7857638926, 298895.444097222, 38486.7982638846,
143712.123263886, 419260.898263889, -113385.347569445, -181233.730902779,
-178686.680902779, -412733.597569445, -380106.797569444, 172783.973263888,
220863.173263891, 11443.2440972247, 392297.319097224, -62825.8267361117,
176278.664930556, 139372.439930556, -174159.88923611, -111755.439236109,
-206233.264236111, -197431.097569445, -55065.5892361099, 48314.3065972236,
-6745.32673610683, 193492.494097225, 155009.569097224, 241747.214930556,
209670.99826389, -173438.47673611, -101510.63923611, -128948.689236113,
-222773.597569443, -498474.472569441, 146856.619097224, -275463.026736109,
386273.214930557, 213400.994097223, 171865.11076389, 464391.381597217,
1489.99826388643, -9918.39340277936, -362009.847569447, NA, NA,
NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12), class = "ts")
Best Answer
As regards the comparison of models, the idea proposed by @forecaster can be helpful since you have a relatively long series.
Regarding your last question about how to obtain forecasts for the trend component rather than for the whole series: as far as I know there is no package on CRAN that decomposes a fitted ARIMA model into a trend and seasonal component. The package ArDec decomposes a series based on autoregressions but I don't think it is straightforward to apply it to an ARIMA model.
Your idea of decomposing the series by means of moving averages could be a solution. As you are interested in forecasting the trend component, it would be better to fit an ARIMA model to the trend component rather than to the noise component $N_t$ and then obtain forecasts based on this model. However, this may not be an efficient approach since you work with a smoothed version of the data instead of using the observed data.
I would try a structural time series model, where a model is explicitly defined for each component. For example, using the function
StructTS
from thestats
core package we obtain the following decomposition:The function
predict
can be used to obtain forecasts,predict(fit1)
, but forecasts are returned for the observed series, not for the components. To obtain forecasts of the components based on a structural model you can use the package stsm.The parameters
var1
,var2
,var3
andvar4
are referred respectively to the variances of the disturbance term in the observation equation and in the level, slope and seasonal components.The components based on the fitted model are returned by
tsSmooth
:Notice that the parameter estimates based on
StructTS
andstsm
are the same, however, the estimated components look better in the latter case compared to those based onStructTS
: the trend component is smoother with no fluctuations at the beginning of the sample and the variance of the seasonal component is more stable throughout time. The reason for this difference in the plots is thatstsm
usesP0cov = FALSE
(a diagonal covariance matrix for the initial state vector, but this is a topic for another post).The forecasts for the components and their $95\%$ confidence intervals can be obtained as follows using the function
predict
from package KFKSDS:Plot of forecasts and confidence intervals: