Solved – How to choose between ARIMA and ARMA model

arimapythonstationaritytime series

I am doing time-series analysis in python for the dataset given below-

Here's link

The plot for the above time series seems to be non-stationary for me because on observing it looks like consisting of some trend.
The plot for the above time series is as given-
enter image description here

The above plot is converted into stationary time series by taking log and then the difference between previous and succeeding values.
I have plotted corresponding ACF and PACF plot for that non-stationary time series which is given below-
enter image description here

From the above ACF it is clear to me that curve cutoff after 1th lag, and also in PACF plot no of ticks outside cutoff is 1. But from these two plots how I should choose the type of model (ARMA or ARIMA). If it is ARMA model then what is (p,q) and Why. Or if it is ARIMA model then what is (p,d,q) and Why?.

Best Answer

There are different methods to decide on the order of integration for a nonseasonal AR(I)MA model. Hyndman & Khandakar (2008, section 3.1) give pointers to the most commonly encountered ones. The most common type would be unit root tests, especially the Dickey-Fuller test, which Hyndman & Khandakar counsel against, since it biases towards more rather than fewer differences. Instead, they use a KPSS test (Kwiatkowski et al., 1992): you test for a unit root; if the test is significant, you difference and test again, until the test is not significant any more.

Yes, these are not the most recent papers, but auto.arima() in the forecast package for R still uses this approach, and that is pretty much as close to the gold standard in time series analysis as you can get.

After you have decided on the order of integration, you need to decide on AR and MA orders. Parsing ACF/PACF plots of successive residuals is the older Box-Jenkins approach; the more modern way would be to minimize an information criterion like the AICc. See the fuller description of how auto.arima() decides on a model order and estimates.

In the present case, auto.arima() would go for an ARIMA(1,1,1) model:

births

births <- read.table("daily-total-female-births-CA.csv",header=TRUE,sep=",",colClasses=c("Date","numeric"))
births_ts <- ts(births$births,frequency=365,start=births$date[1])

library(forecast)
plot(forecast(auto.arima(births_ts,stepwise=FALSE,approximation=FALSE),h=30))

Since you work in Python, you may be interested in pmdarima and in this SO thread: auto.arima() equivalent for python.

Related Question