Time Series – How to Decompose and Make a Time Series Stationary

augmented-dickey-fullerdifferencingseasonalitystationaritytime series

I am looking for some suggestions for my time series.
I am dealing with the column "Temperature (C)" from this dataset. I am trying to make it stationary in order to do some forecasting on it.

Here's what I did until now:

  1. I uploaded the dataset and set date&hour as index
  2. I pruned by dataset: I kept only records which have 12:00:00 as hour
  3. I tried to compute the adf test to understand if the data is stationary or not. It told me it was non-stationary.

Therefore, I tried two things:

  1. Differenciate the data –> adf resulted "non-stationary"
  2. Decomposed the data –> again I got "non-stationary"

What should I do? I am stuck…
Here's my code of this last part:

# adf test part
adftest = adfuller(ts, autolag='AIC')
output = pd.Series(adftest[0:4], index=['Statistic', 'p-value', '# Lags Used', '#Observations Used'])
for key, value in adftest[4].items():
    output['Critical Value (%s)' % key] = value
# result:
# Statistic                        -2.685398
# p-value                           0.076599
# #Lags Used                       14.000000
# #Observations Used             1813.000000
# Critical Value (1%)              -3.433962
# Critical Value (5%)              -2.863136
# Critical Value (10%)             -2.567619

I think this indicates that I have non-stationary data, as p-value > critical value (any). Is this correct?

I then tried to differenciate:

one_diff = ts.diff(periods=1)

But, again, calculating the adf test told me this result is non-stationary (even differenciating like 6 times).

Looking online, I read about decomposition. Therefore, I tried it here:

result = seasonal_decompose(ts, model='additive')

Here's the result:

result of decomposition

I think nothing's working, but I don't know how to do it properly. Can you help me please? What am I doing wrong?

Thank you in advance.

Best Answer

Judging by the graph of the raw data (the top one in the four-graph panel), your data does not seem to have a unit root. Therefore, differencing it is unhelpful and could be harmful (keyword: overdifferencing). The data does appear to be seasonal with a period of 1 year, and to account for that in a model would make sense. You could use Fourier terms; see e.g. Rob Hyndman's blog posts "Forecasting with long seasonal periods" and "Forecasting weekly data" for details.

Related Question