Solved – Interpretation of the Chi Square test for two time series in R

chi-squared-testrtime series

I've used R's "chisq.test()" function to calculate the chi square statistic between two time series, the first is a stock market series of a bank with a length 4262, and the second is a simulation of the first using some model with the same length. I used chisq.test as a measure of independence of the two series. Using R, I get the following output.

    Pearson's Chi-squared test

data:  as.numeric(banks[, 2]) and as.numeric(banks[, 3])
X-squared = 16117000, df = 16094000, p-value = 3.434e-05

Regardless of whether this method is correct, I don't understand where the high degrees of freedom comes from. I assumed the chi square test measures the statistic between two series using a 2 X length contingency table, with one column as observed and the other as expected. Can anyone tell me what is actually being calculated in this test?

Best Answer

Chi-square stats is used to compare the difference between observed value and the expected value. One Example can be the following case:

| category | Observed | Expected |
+------------+--------------+-------------+
| True | 4 | 3 |
| False | 6 | 7 |
+-------+---+--+

chi2 = (4-3)/3 + (6-7)/7. The according degree of freedom = n_cat - 1 = 2 - 1 = 1.

In your example, the number of categories is not 2 (Category), but the number of time series points. Take a daily time series data in a year as an example, where observed values are consecutively increasing numbers 1 to 365, and the expected values are consecutively decreasing numbers 365 to 1.

| category | Observed | Expected |
+------------+--------------+-------------+
| 2017-01-01 | 1 | 365
| 2017-01-02 | 2 | 364
| ... | ... |
| 2017-12-31 | 365 | 1
+---------------+------+--+

chi2 = (1-365)^2/365 + (2-364)^2/364 + ... + (365-1)^2/365. The according n_cat in this case is 365 (there are 365 days in 2017). Thus the dof = 365 - 1 = 364 instead of 2.