Chi-Squared Test – How to Apply Chi-Squared Test to Multidimensional Data

chi-squared-testcontingency tablescramers-vlog-linear

I want to test if two observations of nominal data accord to the same distribution. I am using the chi squared statistics to perform a chi squared homogeneity test and normalize the result with Cramer's $\phi$.

Unfortunately, all the examples for performing a chi squared homogeneity test I could find (e.g. here) perform the test with two one-dimensional observations. The example after the link above, for example, compares boys and girls dependent on their viewing preferences. This makes two observations in the form $[x_0, \ldots, x_n]$. However, I want to test observations in the form $[[x_0, \ldots, x_n], \ldots, [z_0, …, z_m]]$. I don't know, if "multidimensional" is the right term in this context. I figured it might be.

Could you explain to me, how a chi squared homogeneity test can be performed with multi-row observations? I know it works, because scipy provides unique values for two dimensional input lists.

import scipy.stats as sps
observation1 = [[95, 31, 20], [70, 29, 18]]
observation2 = [[21, 69, 98], [54, 35, 11]]
data = [observation1, observation2]
print sps.chi2_contingency(data)

The code above yields (159.18016188570166, 4.772222443744986e-31, 7, array([[[ 69.44008748, 47.45072645, 42.53205358], [ 45.11526642, 30.82876539, 27.63310068]], [[ 76.04085626, 51.96125177, 46.57502446], [ 49.40378984, 33.75925639, 30.25982128]]])) where the first value is chi squared. Flattening the observations yields different values, so there must be a difference.

How do you calculate $\chi^2$ for determining the homogeneity of multidimensional observations? Please note that I know how to calculate $\chi^2$ on a multi-row contingency table. Instead I want to know how to perform a for the homogeneity of two contingency tables.

Example:

Table 1                                   Table 2
        outcome0 outcome1 outcome2 sum                outcome0 outcome1 outcome2
action0       95       31       20 146        action0       21       69       98
action1       70       29       18 117        action1       54       35       11
    sum      165       60       38 263

Question: Are these observations following the same distribution?

Scipy allows to determine the expected values (and $\chi^2 $) of multi-dimensional observations. The expected values are returned as the last element of its call:

Expected table 1                      Expected table 2
        outcome0 outcome1 outcome2            outcome0 outcome1 outcome2
action0    69.44    47.45    42.53    action0    76.04    51.96    46.58
action1    45.12    30.83    27.63    action1    49.40    33.76    30.26

The sum of the squared differences, divided by the expected value, yields the $\chi^2$ value. Note, however, that the expected tables are not just the expected values from each individual table (quick proof for the first cell of table 1: $\frac{146 \cdot 165}{263} = 91.60$). Instead their calculation seems to be somehow dependent on each other.

I'd like to know how the expected values depend on each other, so that I can calculate it myself.

Best Answer

To analyze a multi-way contingency table, you use log-linear models. In truth, log-linear models are a special case of the Poisson generalized linear model, so you could do that, but log-linear models are more user-friendly. In Python, you may need to use the Poisson GLM, as I gather log-linear models may not be implemented. I will demonstrate the log-linear model using your data with R.

library(MASS)
tab = array(c(95, 31, 20, 70, 29, 18, 21, 69, 98, 54, 35, 11), dim=c(3,2,2))
tab = as.table(tab)
names(dimnames(tab)) = c("outcomes", "actions", "observations")
dimnames(tab)[[1]] = c("0", "1", "2")
dimnames(tab)[[2]] = c("0", "1")
dimnames(tab)[[3]] = c("1", "2") 
tab
# , , observations = 1
#         actions
# outcomes  0  1
#        0 95 70
#        1 31 29
#        2 20 18
# 
# , , observations = 2
#         actions
# outcomes  0  1
#        0 21 54
#        1 69 35
#        2 98 11

Log-linear models are simply a series of goodness of fit tests. We can start with a (trivial) null model that assumes all cells have the same expected value:

summary(tab)
# Number of cases in table: 551 
# Number of factors: 3 
# Test for independence of all factors:
#  Chisq = 159.18, df = 7, p-value = 4.772e-31

The null is rejected. Next, we can fit a saturated model:

m.sat = loglm(~observations*actions*outcomes, tab)
m.sat
# Call:
# loglm(formula = ~observations * actions * outcomes, data = tab)
# 
# Statistics:
#                  X^2 df P(> X^2)
# Likelihood Ratio   0  0        1
# Pearson            0  0        1

Naturally, this fits perfectly. At this point, we could build up from the null model seeing if additional terms improve the fit, or drop terms from the saturated model to see if the fit gets significantly worse. The latter is more convenient and is conventional. To see if the distribution of outcomes by actions differs as a function of the observation, we need to drop the interactions between the observations and the actions * outcomes. If we also drop the marginal effect of observations, we are testing if the mean count differs between the two levels of observations. That may or may not be of interest to you, I don't know.

m1 = loglm(~observations + actions*outcomes, tab)
sum(tab[,,1])  # 263
sum(tab[,,2])  # 288
m2 = loglm(~actions*outcomes, tab)
anova(m2, m1)
# LR tests for hierarchical log-linear models
# 
# Model 1:
#   ~actions * outcomes 
# Model 2:
#   ~observations + actions * outcomes 
# 
#           Deviance df Delta(Dev) Delta(df) P(> Delta(Dev))
# Model 1   126.4172  6                                    
# Model 2   125.2825  5   1.134691         1         0.28678
# Saturated   0.0000  0 125.282534         5         0.00000

Model 1 has dropped a single degree of freedom from Model 2 (note that, confusingly, Model 1 $\leftrightarrow$ m2, and Model 2 $\leftrightarrow$ m1), but the decrease in model fit is very small. It is not significant. There is not enough evidence to suggest that the mean counts differ by observation. On the other hand, when Model 2 is compared to the Saturated model, the decrease in fit is highly significant. The data are inconsistent with the idea that the distribution of counts is the same in both levels of observation.