Solved – Negative values in time series forecast and high fluctuations in input data

arimapythontime series

I am trying to perform univariate time series forecasting in python on a monthly rainfall dataset of 136 years using ARIMA.

My dataset is of the form:

YEAR RAINFALL

2000-01-01 0

2000-02-01 128.2

2000-03-01 0

2000-04-01 289.3

.
.
.

I have two issues.

1) My forecast results have negative values though there are none in the training set and logically the rainfall values shouldn't be negative. My original data plot is as below.

enter image description here

Below is the graph of the test data and predicted values. As you can see the red curve of forecasted values extends below 0.

enter image description here

2) Since I have monthly data, the rainfall in some rows goes from a 0 to directly a high value in the next month, in which case the current value doesn't depend on the previous observed values as is the principle of autoregression. Is this what is causing a problem and not giving me a good fit? I have tried using yearly data instead but that doesn't give a right fit either and working with quarterly frequency will interrupt the actual monsoon period of the region of my dataset.

Here is the link to my dataset- https://docs.google.com/spreadsheets/d/1JEj9QZNQagLg-hKhzF2p0yNJsxceMlN1l0LpGDs4eg4/edit?usp=sharing

Best Answer

I took your 1380 monthly values and introduced them to AUTOBOX and the following useful model ( in 3 parts ) was automatically developed/identified enter image description here and enter image description here and enter image description here . The residual plot is here enter image description here with acf here enter image description here . A significant error variance change reduction) was identified here enter image description here . Forecasts are here enter image description here which were generated using monte-carlo / bootstrapping procedures.

As it turned out no expected value forecast was negative , but if it had one should/could simply convert it to zero as no constraint is available .. just a logical constraint.

In terms of your forecast function based upon a model you didn't share ... I would suggest better analytics might be helpful ... including remedying unusual values and non-constant error variance. The ARIMA model that was developed was (0,0,0)(0,1,1)12 . The ARIMA model should always be identified using data adjusted for deterministic structure.

You might want to look at How to improve this time series model? for a similar case study.

Related Question