Time-Series – Why is it Difficult to Find a Suitable ARIMA Model for a Dataset?

acf-pacfarimamodel selectionseasonalitytime series

I have a monthly dataset. I applied ADF test and saw that this dataset is stationary. Also, Canova-Hansen test is applied to see if there is stochastic or deterministic seasonality. As you see below the dataset is stationary and it shows deterministic seasonality.

Augmented Dickey-Fuller Test

data:  rainfall
Dickey-Fuller = -5.1491, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary


    Canova and Hansen test for seasonal stability

data:  rainfall

      statistic pvalue  
Jan      0.1378 0.4886  
Feb      0.3668  0.092 .
Mar      0.2485 0.2116  
Apr      0.2401 0.2248  
May      0.1674 0.3885  
Jun      0.1745 0.3677  
Jul      0.0541 0.9246  
Aug      0.1059 0.6304  
Sep      0.1084 0.6179  
Oct      0.1546 0.4286  
Nov      0.0565 0.9118  
Dec      0.0828 0.7567  
joint    1.5638 0.4809  
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Test type: seasonal dummies 
NW covariance matrix lag order: 14 
First order lag: no 
Other regressors: no  
P-values: based on response surface regressions

I did not find the suitable ARIMA model for this dataset. I runned auto.arima function with xreg=seasonaldummy(dataset) in RStudio suggested ARIMA(0,0,0) as a best model. I tried alternative models by looking ACF and PACF plots but alternative models has very low R^2 values.

Seasonality cannot be detected from time series plots and ACF plots, but Canova-Hansen test validated the existence of seasonality. Also, I think there is an indication o seasonality because dataset has lower values for summer season. Is there any recommendation about how to find suitable ARIMA model?

Best Answer

There is no general rule saying that at least one of the ARIMA orders must be nonzero. There are time series for which ARIMA(0,0,0) is a suitable model. In your case, it seems there are no ARIMA patterns beyond deterministic seasonality that you already have modelled using dummy variables. That is in line with the ACF and PACF plots. If you did not include seasonal dummies, you would probably have a seasonal AR(1) model, but that need not be the better alternative compared to seasonal dummies. You are probably fine the way you have done it now.

Related Solutions

Solved – P-value of Augmented Dickey-Fuller test and KPSS test

Two things:

With ADF, what you do is to test both the null of a unit root against a stationary process as well as against an explosive process, i.e., in a model like $y_t=\rho y_{t-1}+\epsilon_t$, that $\rho=1$ against $|\rho|<1$ or against $\rho>1$.

There is no reason whatsoever that inability to reject a null against an alternative in one direction should automatically imply that we will be able to reject in the opposite direction. This is not specific to unit root tests at all: it is perfectly possible that the data is not sufficiently informative to reject the null that a regression coefficient is zero against a positive or against a negative coefficient.

With KPSS you are not looking at the same types of alternatives. Instead, you are using two different specifications for the deterministic trend part of the process, level and trend. You first test the null that the process is stationary around some constant mean, and in the second case, that the process is stationary around some time trend.

Solved – Choosing the right ARIMA model when data are already seasonally adjusted

Modelling seasonally adjusted (SA) data is not generally recommended. Gómez and Maravall (2001) [1] illustrate this with a case where the autocorrelation function of the seasonally adjusted series turns out to be more complex (contains non-zero values at large lags) than that for the original series.

Seasonally adjusted data are not provided as auxiliary data intended to simplify the statistical analysis. Instead, they are provided to simplify the interpretation of the data; they give a clearer picture of the long-term pattern (e.g., for interpretation of the economic situation, etc.) and are helpful even for people not necessarily knowledgeable in statistics.

If you want to carry out a statistical analysis, then it is better to work with the not seasonally adjusted data.

[1] Gómez and Maravall (2001). Seasonal Adjustment and Signal Extraction in Economic Time Series. doi:10.1002/9781118032978.ch8.

The software TRAMO and SEATS (used by many statistical offices) returns an ARIMA model for the seasonally adjusted data based on the decomposition of an ARIMA model fitted to the original data. That would be a better approach than fitting a model for the SA data.

As regards the seasonality present in the SA data that you show: The seasonal differencing suggests overdifferenciation (negative ACF at seasonal lags).

A quick view to the SA data reveals that the variance of a seasonal component based on LOESS decomposition (smoothing) of the SA series is negligible. Notice also in the graphic below that the seasonal component obtained by LOESS ranges between -0.02 and 0.03, which is very narrow compared to the range of the SA data (between 3.4 and 10.8).

x <- structure(c(4,3.9,4.2,4,4.3,4.3,4.4,4.1,3.9,3.9,4.3,4.2,4.2,3.9,3.7,3.9,4.1,4.3,4.2,4.1,4.4,4.5,5.1,5.2,5.8,6.4,6.7,7.4,7.4,7.3,7.5,7.4,7.1,6.7,6.2,6.2,6,5.9,5.6,5.2,5.1,5,5.1,5.2,5.5,5.7,5.8,5.3,5.2,4.8,5.4,5.2,5.1,5.4,5.5,5.6,5.5,6.1,6.1,6.6,6.6,6.9,6.9,7,7.1,6.9,7,6.6,6.7,6.5,6.1,6,5.8,5.5,5.6,5.6,5.5,5.5,5.4,5.7,5.6,5.4,5.7,5.5,5.7,5.9,5.7,5.7,5.9,5.6,5.6,5.4,5.5,5.5,5.7,5.5,5.6,5.4,5.4,5.3,5.1,5.2,4.9,5,5.1,5.1,4.8,5,4.9,5.1,4.7,4.8,4.6,4.6,4.4,4.4,4.3,4.2,4.1,4,4,3.8,3.8,3.8,3.9,3.8,3.8,3.8,3.7,3.7,3.6,3.8,3.9,3.8,3.8,3.8,3.8,3.9,3.8,3.8,3.8,4,3.9,3.8,3.7,3.8,3.7,3.5,3.5,3.7,3.7,3.5,3.4,3.4,3.4,3.4,3.4,3.4,3.4,3.4,3.4,3.5,3.5,3.5,3.7,3.7,3.5,3.5,
3.9,4.2,4.4,4.6,4.8,4.9,5,5.1,5.4,5.5,5.9,6.1,5.9,5.9,6,5.9,5.9,5.9,6,6.1,6,5.8,6,6,5.8,5.7,5.8,5.7,5.7,5.7,5.6,5.6,5.5,5.6,5.3,5.2,4.9,5,4.9,5,4.9,4.9,4.8,4.8,4.8,4.6,4.8,4.9,5.1,5.2,5.1,5.1,5.1,5.4,5.5,5.5,5.9,6,6.6,7.2,8.1,8.1,8.6,8.8,9,8.8,8.6,8.4,8.4,8.4,8.3,8.2,7.9,7.7,7.6,7.7,7.4,7.6,7.8,7.8,7.6,7.7,7.8,7.8,7.5,7.6,7.4,7.2,7,7.2,6.9,7,6.8,6.8,6.8,6.4,6.4,6.3,6.3,6.1,6,5.9,6.2,5.9,6,5.8,
5.9,6,5.9,5.9,5.8,5.8,5.6,5.7,5.7,6,5.9,6,5.9,6,6.3,6.3,6.3,6.9,7.5,7.6,7.8,7.7,7.5,7.5,7.5,7.2,7.5,7.4,7.4,7.2,7.5,7.5,7.2,7.4,7.6,7.9,8.3,8.5,8.6,8.9,9,9.3,9.4,9.6,9.8,9.8,10.1,10.4,10.8,10.8,10.4,10.4,10.3,10.2,10.1,10.1,9.4,9.5,9.2,8.8,8.5,8.3,8,7.8,7.8,7.7,7.4,7.2,7.5,7.5,7.3,7.4,7.2,7.3,7.3,7.2,7.2,7.3,7.2,7.4,7.4,7.1,7.1,7.1,7,7,6.7,7.2,7.2,7.1,7.2,7.2,7,6.9,7,7,6.9,6.6,6.6,6.6,6.6,6.3,6.3,6.2,
6.1,6,5.9,6,5.8,5.7,5.7,5.7,5.7,5.4,5.6,5.4,5.4,5.6,5.4,5.4,5.3,5.3,5.4,5.2,5,5.2,5.2,5.3,5.2,5.2,5.3,5.3,5.4,5.4,5.4,5.3,5.2,5.4,5.4,5.2,5.5,5.7,5.9,5.9,6.2,6.3,6.4,6.6,6.8,6.7,6.9,6.9,6.8,6.9,6.9,7,7,7.3,7.3,7.4,7.4,7.4,7.6,7.8,7.7,7.6,7.6,7.3,7.4,7.4,7.3,7.1,7,7.1,7.1,7,6.9,6.8,6.7,6.8,6.6,6.5,6.6,6.6,6.5,6.4,6.1,6.1,6.1,6,5.9,5.8,5.6,5.5,5.6,5.4,5.4,5.8,5.6,5.6,5.7,5.7,5.6,5.5,5.6,5.6,5.6,5.5,
5.5,5.6,5.6,5.3,5.5,5.1,5.2,5.2,5.4,5.4,5.3,5.2,5.2,5.1,4.9,5,4.9,4.8,4.9,4.7,4.6,4.7,4.6,4.6,4.7,4.3,4.4,4.5,4.5,4.5,4.6,4.5,4.4,4.4,4.3,4.4,4.2,4.3,4.2,4.3,4.3,4.2,4.2,4.1,4.1,4,4,4.1,4,3.8,4,4,4,4.1,3.9,3.9,3.9,3.9,4.2,4.2,4.3,4.4,4.3,4.5,4.6,4.9,5,5.3,5.5,5.7,5.7,5.7,5.7,5.9,5.8,5.8,5.8,5.7,5.7,5.7,5.9,6,5.8,5.9,5.9,6,6.1,6.3,6.2,6.1,6.1,6,5.8,5.7,5.7,5.6,5.8,5.6,5.6,5.6,5.5,5.4,5.4,5.5,5.4,5.4,5.3,5.4,5.2,5.2,5.1,5,5,4.9,5,5,5,4.9),.Tsp=c(1956,2005.91666666667,12),class="ts")
res <- stl(x, s.window="periodic")
plot(res)
var(res$time[,"seasonal"])
#[1] 0.0001334721
var(x)
#[1] 2.075675

Best Answer

Related Solutions

Solved – P-value of Augmented Dickey-Fuller test and KPSS test

Solved – Choosing the right ARIMA model when data are already seasonally adjusted

Related Question