Solved – Poor fit of an ARIMA model

arimapythontime series

Well, I'm a very newbie in time series forecasting methods, and I'm trying to fit an ARIMA to my time series data and the result is poor. See figure.
enter image description here

It seems to be stationary and the Dickey-Fuller test gives me p<0.05, so, I tried found what ARMA to use using statsmodels arma_order_select_ic. Don't think is a time series transformation problem, because this results are far from good and when I tried use log (or sqrt) transformation, nothing seems to change in the fitting.

Here is the data and the code I'm using.

ts = pd.read_csv('path/data.csv',index_col=0,parse_dates=True)
ts = pd.Series(ts['ts'])

# # Testing stationarity
# import statsmodels.tsa.stattools as tsa
# dfuller_test = tsa.adfuller(ts, autolag='AIC')
# dfuller_output = pd.Series(dfuller_test[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
# print(dfuller_output)

# # Plotting ACF and PACF
# from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# plot_acf(ts,lags=50)
# plot_pacf(ts,lags=50)

# # Finding best p,q
# import statsmodels.api as sm
# res = sm.tsa.arma_order_select_ic(ts, ic=['aic', 'bic'], trend='nc')
# print(res.aic_min_order)

p,d,q = 4,0,1
import pyflux as pf
model = pf.ARIMA(data=pd.DataFrame(ts), ar=p, integ=d, ma=q)
x = model.fit()
model.plot_fit(figsize=(15,4))
mu, Y = model._model(model.latent_variables.get_z_values())
fitted_values = pd.Series(model.link(mu),index=ts.ix[-len(mu):].index)
ts.subtract(fitted_values).plot()

My question is if I'm missing something in this fitting process, or data needs any transformation or normalization? Do you think other model could do it better, as GARCH for instance?

Best Answer

From my perspective, the time series is stationary and does not exhibit much time-varying variance. I.e. there are no pronounced volatility clusters. Hence, a GARCH model is unlikely to provide further insights, as already mentioned by Chris Haug.

Furthermore, the long term mean is clearly roughly 10 and the noise does not look like containing much autocorrelation. From this graphical analysis I would suggest that the best you can get from the data for forecasting is something like: $$y_t=\mu+u_t, u_t \sim N(0,\sigma^2), $$ whith $\mu \approx 10$ I suggest. This might look too simplistic, but at least it allows you to give some forecast intervall for the long term mean if you estimate $\sigma^2$. On top of that, also the arima model is mean-reverting, i.e. if your forcast horizon increases your forceast also will very quickly converge to $\mu$.

With the inclusion of four AR terms and one MA term you might get trouble with overfitting, leading to a only slightly better fit (which nevertheless is not able to fit the whole of the amplitude) but almost certainly no better forecast model. This may sound dissappointing, but at least it prevents you form reading something in the data that is not there.

Is the data already differenced? If you are aiming to predict and the plot gives the absolute changes you could try to fit the general trend in the non-differenced data.