I am looking for some suggestions for my time series.
I am dealing with the column "Temperature (C)" from this dataset. I am trying to make it stationary in order to do some forecasting on it.
Here's what I did until now:
- I uploaded the dataset and set date&hour as index
- I pruned by dataset: I kept only records which have 12:00:00 as hour
- I tried to compute the adf test to understand if the data is stationary or not. It told me it was non-stationary.
Therefore, I tried two things:
- Differenciate the data –> adf resulted "non-stationary"
- Decomposed the data –> again I got "non-stationary"
What should I do? I am stuck…
Here's my code of this last part:
# adf test part
adftest = adfuller(ts, autolag='AIC')
output = pd.Series(adftest[0:4], index=['Statistic', 'p-value', '# Lags Used', '#Observations Used'])
for key, value in adftest[4].items():
output['Critical Value (%s)' % key] = value
# result:
# Statistic -2.685398
# p-value 0.076599
# #Lags Used 14.000000
# #Observations Used 1813.000000
# Critical Value (1%) -3.433962
# Critical Value (5%) -2.863136
# Critical Value (10%) -2.567619
I think this indicates that I have non-stationary data, as p-value > critical value (any). Is this correct?
I then tried to differenciate:
one_diff = ts.diff(periods=1)
But, again, calculating the adf test told me this result is non-stationary (even differenciating like 6 times).
Looking online, I read about decomposition. Therefore, I tried it here:
result = seasonal_decompose(ts, model='additive')
Here's the result:
I think nothing's working, but I don't know how to do it properly. Can you help me please? What am I doing wrong?
Thank you in advance.
Best Answer
Judging by the graph of the raw data (the top one in the four-graph panel), your data does not seem to have a unit root. Therefore, differencing it is unhelpful and could be harmful (keyword: overdifferencing). The data does appear to be seasonal with a period of 1 year, and to account for that in a model would make sense. You could use Fourier terms; see e.g. Rob Hyndman's blog posts "Forecasting with long seasonal periods" and "Forecasting weekly data" for details.