Solved – Interpretation of the Chi Square test for two time series in R

chi-squared-testrtime series

I've used R's "chisq.test()" function to calculate the chi square statistic between two time series, the first is a stock market series of a bank with a length 4262, and the second is a simulation of the first using some model with the same length. I used chisq.test as a measure of independence of the two series. Using R, I get the following output.

    Pearson's Chi-squared test

data:  as.numeric(banks[, 2]) and as.numeric(banks[, 3])
X-squared = 16117000, df = 16094000, p-value = 3.434e-05

Regardless of whether this method is correct, I don't understand where the high degrees of freedom comes from. I assumed the chi square test measures the statistic between two series using a 2 X length contingency table, with one column as observed and the other as expected. Can anyone tell me what is actually being calculated in this test?

Best Answer

Chi-square stats is used to compare the difference between observed value and the expected value. One Example can be the following case:

| category | Observed | Expected |
+------------+--------------+-------------+
| True | 4 | 3 |
| False | 6 | 7 |
+-------+---+--+

chi2 = (4-3)/3 + (6-7)/7. The according degree of freedom = n_cat - 1 = 2 - 1 = 1.

In your example, the number of categories is not 2 (Category), but the number of time series points. Take a daily time series data in a year as an example, where observed values are consecutively increasing numbers 1 to 365, and the expected values are consecutively decreasing numbers 365 to 1.

| category | Observed | Expected |
+------------+--------------+-------------+
| 2017-01-01 | 1 | 365
| 2017-01-02 | 2 | 364
| ... | ... |
| 2017-12-31 | 365 | 1
+---------------+------+--+

chi2 = (1-365)^2/365 + (2-364)^2/364 + ... + (365-1)^2/365. The according n_cat in this case is 365 (there are 365 days in 2017). Thus the dof = 365 - 1 = 364 instead of 2.

Related Solutions

Solved – use a chi-squared test to compare two empirical distributions

The chi-square test of homogeneity of proportions can be used to compare two sets of multinomial proportions over the same set of values. In that sense it can be thought of as a two-sample equivalent (or indeed more than two-) of the chi-squared goodness of fit test (in a similar sense to the way a two-sample Kolmogorov-Smirnov test relates to a one-sample test).

In short, yes, provided the conditions for the test to be suitable all hold to a sufficient-for-your-purposes approximation.

R – Using Pearson’s Chi-Square (N-1) in R Programming

According to this page the N-1 correction is very simple; just multiply $\chi^2$ by (N-1)/N. You could then use the pchisq function in R to get the right p value (the exact code would be, I believe, something like

newchisq = ((N-1)/N) * oldchisq
newp <- 1 - pchisq(newchisq, df)

Best Answer

Related Solutions

Solved – use a chi-squared test to compare two empirical distributions

R – Using Pearson’s Chi-Square (N-1) in R Programming

Related Question