Solved – Determining order of ARIMA model using Box-Jenkins. Correct approach / argumentation

arimabox-jenkinsforecastingmodelingtime series

I obtained a couple of time series from estimating my (mortality-)model which I now aim to forecast with an appropriate ARIMA(p,d,q) model, which should be chosen with the use of the Box-Jenkins methodology. As I'm not experienced in time series analysis at all, I wonder wether my argumentation and conclusions are correct. Starting with the two more easy ones:

enter image description here

The two series are clearly unstationary, as can simply be seen by looking at the plot and/or at the ACF which doesn't cut off. The large spike close to 1 at lag 1 at the PACF for both series is an additional indicator. Hence, I difference the time series to obtain:

enter image description here

The differenced series is now stationary (thus d=1), which can be seen by the plot or the ACF fluctuating around zero very quickly (in fact the first lag is already non-significant for both series).

As both series have already an ACF and a PACF of a white noise process, a random walk with drift seems reasonable. Indeed, using auto.arima() in R, an ARIMA(0,1,0) with drift has the lowest AIC (52.5149) for the Male series, followed by an ARIMA (1,1,0) with drift (AIC=53.1852) and an ARIMA(1,1,1) with drift (AIC=55.62567).

In contrast, using auto.arima() for the time series for Females, the lowest AIC (58.7558) has an ARIMA(2,1,0) process with drift, followed by the ARIMA(2,1,1) with drift (AIC=59.68013) and the ARIMA(0,1,0) with drift(AIC=61.73585).

I fitted all chosen ARIMA models to the time series in order to investigate their residuals. All of them show the behavior of a white noise process since max 1 spike slighlty exceeds the critical values (corresponding to 5% of the 20 spikes):

enter image description here
enter image description here

Additionally, I conducted a Ljung-Box test for the residuals of every considered ARIMA model, showing a non-significance of Q* for all of them, supporting the white noise conclusion.

However, as the differences in the AIC's are not really big, I would tend to use the random walk with drift as an appropriate model for both time series due to parsimonity reasons if I had to choose without further investigation.

Before dealing with the more difficult time series, I would like to know if I misunderstood something until this point? Did I overlook something? Is the argumentation correct? Could I do something else in order to support my final choice?

If not, I would probably consider the ARIMA(p,1,q) model with drift that has the highest AIC and the random walk with drift when backtesting my model. I would be very grateful as well for advice concerning backtests.

Edit: My data is for Males:
12.9860268 12.5362944 10.9379455 10.7029227 9.6421311 8.1687120
7.0846535 6.7134053 6.5685634 5.6701865 4.2352191 4.3919294
3.1960928 2.8841746 2.1974112 0.5650275 -0.5647561 -1.7419743
-2.9294583 -4.4563460 -4.9608364 -5.3176373 -7.8000258 -8.4957238
-10.1346795 -10.9322896 -11.4916410 -12.1036813 -13.0225720 -14.5290742

and for Females:
11.9915429 11.4046523 9.3884780 9.0869933 8.3635873 6.5017410
5.7362260 5.3628629 5.5744462 4.5275918 3.0051614 3.3005647
2.2039425 1.6202293 1.2855324 -0.3084491 -1.1011587 -1.9107314
-3.0093942 -4.1452598 -4.2212783 -4.1650695 -6.7117571 -7.0414949
-8.0775438 -9.1197481 -8.8365339 -9.1646399 -10.1484378 -11.3920554

Best Answer

Both series can be easily mis-modeled with random walk plus drift. The preferred enter image description herefemale series equation is with the following statistical summary enter image description here .I has two pulses and an ma(2) coefficient of -.35. The male series equation is enter image description here reflecting two distinct trends 1-12 and 13-30 with a pulse at 23 and an ma(1) of .504. The statistical summary for the male series isenter image description here . Is it reasonable that both series were effected at period 23 by a common outside/exogenous factor ? For transparency reasons I used AUTOBOX which I have helped develop.