Solved – How to interpret KPSS test

kpss-testtime series

I have been trying to understand the kpss test and I have read this answer and have been reading information from this KPSS Test: Definition and Interpretation , but am still confused about my own results.

I am confused because the critical values for 1,5,10% etc. I receive on my own data do not follow the order I would expect.

The null hypothesis is that the data is stationary.

I reject the null hypothesis if the test statistic > the critical value.

I would expect it to be "easier" to reject the null at the 1% and "harder" to reject the null at 10%. In other words I expect the critical values to be in a sequence 1%<2.5%<5%<10%

I have a series of hourly temperature data and if I run the KPSS test from statsmodels I get the following output

kpss(series, lags = 230)
(0.70411510645393605,
 0.013171353958733086,
 230,
 {'1%': 0.739, '10%': 0.347, '2.5%': 0.574, '5%': 0.463})

Do my results mean that I would reject the null hypothesis at the 10%, but not at the 1%?

Also in the KPSS Test: Definition and Interpretation table for the critical values they are in the order 1%>5%>10%, which would imply at a given test statistic you can reject at the 5 or 10%, but not at the 1%.

Can someone tell me what I am missing?

Best Answer

As already mentioned in the comments the statistic value have to be more extreme than the chosen critical value. In your linked blog there is good image describing it:

Source: https://onlinecourses.science.psu.edu/statprogram/reviews/statistical-concepts/hypothesis-testing/critical-value-approach

The statistic value determines the position in this probability distribution. As you can see a more extreme value can be lower or higher than your signifance level (left or right tail of the distribution).

Because the distribution has zero mean you can just use the absolute value of both test value and critical value:

|kpss_val| > |critical_value| = null rejected

You may also use the p value which is returned by the statsmodel implementation. Note that it's only in the range [0.01, 0.1]. You can reject it with p=0.01 and you may not reject it at p=0.1.

p < significance_level = null rejected

If it's still not clear I propose reading the related chapter in the Wikipedia article. It explains more the intuion behind the hypothesis test and its rejection.

Concluding here the code I came up with for my time series analysis (testing KPSS and ADF):

import statsmodels.tsa.stattools as stats
import numpy as np

# Reject any null hypothesis if p value is below a significant level / statistic value is more extreme than the related critical value
# Weak assumption: If we can't reject a null hypothesis we assume that it's true
# p=100% would mean the null hypothesis is correct. Below 5% we can "safely" reject it

# Null hypothesis: is stationary
def is_stationary(X): # = not able to reject null hypothesis
    # Null hypothesis: x is stationary (not trend stationary); Note: test tends to reject too often
    kpss_stat, p_value, lags, critical_values = stats.kpss(X)
    return abs(kpss_stat) < abs(critical_values['5%'])
    # Same as return p_value >= 0.05

# Null hypothesis: has unit root = I(1)
def has_unit_root(X): # = not able to reject null hypothesis
    # Null hypothesis: x has a unit root (= is not stationary, but might be trend stationary)
    adf_stat, p_value, used_lag, nobs, critical_values, icbest = stats.adfuller(X)
    return abs(adf_stat) < abs(critical_values['5%'])
    # Same as return p_value >= 0.05

a = np.arange(100)
print('Has test #1 (linear function) a unit root ? ->', has_unit_root(a))
print('Is test #1 (linear function) stationary ? ->', is_stationary(a), end='\n\n')
b = np.random.rand(100)
print('Has test #2 (white noise) a unit root ? ->', has_unit_root(b))
print('Is test #2 (white noise) stationary ? ->', is_stationary(b), end='\n\n')
c = np.cumsum(b - 0.5)
print('Has test #3 (random walk) a unit root ? ->', has_unit_root(c))
print('Is test #3 (random walk) stationary ? ->', is_stationary(c), end='\n\n')

Output:

Has test #1 (linear function) a unit root ? -> True

Is test #1 (linear function) stationary ? -> False

Has test #2 (white noise) a unit root ? -> False

Is test #2 (white noise) stationary ? -> True

Has test #3 (random walk) a unit root ? -> True

Is test #3 (random walk) stationary ? -> False

I hope I did everything right. Otherwise don't hesitate to give me feedback so we have a final answer to this problem.

Greetings, Thomas

Related Question