Solved – Multiple ARIMA models fit data well. How to determine order? Correct approach

arimaautocorrelationmodelingrtime series

I've got two time series (parameters of a model for males and females) and aim to identify an appropriate ARIMA model in order to make forecasts. My time series looks like:

enter image description here

The plot and the ACF show non-stationary (the spikes of the ACF cut off very slowly). Thus, I use differencing and obtain:

enter image description here

This plot indicate that the series might now be stationary and the application of the kpss test and the adf test support this hypothesis.

Starting with the Male series, we make the following observations:

The empirical autocorrelations at Lags 1,4,5,26 and 27 are significant different from zero.
The ACF cuts off (?), but I'm concerned about the relatively big spikes at lag 26 and 27.
Only the empirical partial autocorrelations at Lags 1 and 2 are significant different from zero.

On ground of these observations alone, if I had to choose a pure AR or MA model for the differenced time series, I would tend to choose either an AR(2) model by arguing that:

We have no significant partial autocorrelations for lag greater than 2
The ACF cuts off except for the region around lag 27. (Are these few outliers alone an indicator, that a mixed ARMA model would be appropriate?)

or an MA(1) model by arguing that:

The PACF clearly cuts off
We have for lags greater 1 only 4 spikes exceeding the critical value in magnitude. This is "only" one more than the 3 spikes (95% out of 60) which would be allowed to lie outside the dotted area.

There are no characteristica of an ARIMA(1,1,1) model and choosing orders of p and q of an ARIMA model on grounds of ACF and PACF for p+q > 2 gets difficult.

Using auto.arima() with the AIC criterion (Should I use AIC or AICC?) gives:

ARIMA(2,1,1) with Drift; AIC=280.2783
ARIMA(0,1,1) with Drift; AIC=280.2784
ARIMA(2,1,0) with Drift; AIC=281.437

All three considered models show white noise residuals:

enter image description here

My summed up questions are:

Can you still describe the ACF of the time series as cutting of despite the spikes around lag 26?
Are these outliers an indicator that a mixed ARMA model might be more appropriate?
Which Information Criterion should I choose? AIC? AICC?
The residuals of the three models with the highest AIC do all show white noise behavior, but the difference in the AIC is only very small. Should I use the one with the fewest parameters, i.e. an ARIMA(0,1,1)?
Is my argumentation in general plausible?
Are their further possibilities to determine which model might be better or should I for example, the two with the highest AIC and perform backtests to test the plausibility of forecasts?

EDIT: Here is my data:

-5.9112948202 -5.3429985122 -4.7382340534 -3.1129015623 -3.0350910288 -2.3218904871 -1.7926701792 -1.1417358384 -0.6665592055 -0.2907748318 0.2899480865 0.4637205370  0.5826312749  0.3869227286  0.6268379174  0.7439125292 0.7641139207  0.7613140511  3.0143912244 -0.7339255839  2.0109976796 0.8282394650 -2.5668367983  5.9826406394  1.9569198553  2.3860893476 2.0883339390  1.9761894580  2.2601997245  2.2464027995  2.5131158613 3.4564765529  4.2307335557  4.0298688374  3.7626317439  3.1026407174 2.1690168737  1.5617407254  2.6790460788  0.4652054768 -0.0501046517 -1.0157683791 -0.5113698054 -0.0180401353 -1.9471272198 -0.2550365250 -1.1269988523  0.5152074134  0.2362626753 -2.9978337017  1.4924705528 -1.4907767844 -0.5492041416 -0.7313021018 -0.6531515868 -0.4094159299 -0.5525401626 -0.0611454515 -0.5256272882 -1.1235247363 -1.7299848758 -1.3807763611 -1.6999054476 -4.3155973110 -4.7843298990

Best Answer

1)Can you still describe the ACF of the time series as cutting of despite the spikes around lag 26?

26 and 27 suggest to me that the data is weekly some sort of annual cycle pf order 26 or 52

Are these outliers an indicator that a mixed ARMA model might be more appropriate?

If there are outliers in the observed series then the ARIMA model becomes a Transfer Function Model with dummy inputs.

Outliers in the acf/pacf are usually non-interpretable. Rathe use the acf/paf of a tentative model suggested by the dominant acf/pacf abd then ITERATE to a more complex model.

Which Information Criterion should I choose? AIC? AICC? The residuals of the three models with the highest AIC do all show white noise behavior, but the difference in the AIC is only very small. Should I use the one with the fewest parameters, i.e. an ARIMA(0,1,1)?

None as it is based upon a trial set of assumed models.

Is my argumentation in general plausible? Vague question ... even vaguer response.

Are their further possibilities to determine which model might be better or should I for example, the two with the highest AIC and perform backtests to test the plausibility of forecasts?

Simply ITERATE (slowly !) to more/less complicated models incorporating both auto-regessive structure and determinstic structure. See http://www.autobox.com/cms/index.php/blog/entry/build-or-make-your-own-arima-forecasting-mode for a logic flow diagram

EDIT AFTER RECEIPT OF DATA:

I was misled by your comment , you used the word lag of 26 and I incorrectly understood you were talking about the acf but you were talking about time point 26. A data set can be non-stationary in a number of ways. If the mean shifts the remedy for this non-stationarity is de-meaning . In your case the non-stationarity is caused by two separate and distinct trends and one significant increase in error variance. Both of these findings are easily supported by the eye. enter image description here

Your data has non-stationarity but the remedy for your data's non-stationarity in the mean is not differencing but de-trending as two trends are found (1-29 and 30-65 ) found via Intervention Detection. Furthermore your error variance is non-stationary significantly increasing at period 28 found via Tsay's test for non-constant error variance, See this reference for both procedures http://www.unc.edu/~jbhill/tsay.pdf . After adjusting for the two trends and error variance change and a few pulses, a simple AR(1) model was found to be adequate. Here is the plot of Actual/Fit/Forecast . The equa enter image description here tion is here with estimation results here . The variance change test is here and the plot of the model's residuals is here . I used AUTOBOX a piece of software that I have helped develop to automatically separate signal from noise. Your data set is the "poster boy" for why simple ARIMA modelling is not widely used because simple methods don't work on complex problems. Note well that the change in error variance is not linkable to the level of the observes series thus power transformations such as logs are not relevant even though published papers present models using that structure. See Log or square-root transformation for ARIMA for a discussion on when to take power transformations.