I want to test if two observations of nominal data accord to the same distribution. I am using the chi squared statistics to perform a chi squared homogeneity test and normalize the result with Cramer's $\phi$.
Unfortunately, all the examples for performing a chi squared homogeneity test I could find (e.g. here) perform the test with two one-dimensional observations. The example after the link above, for example, compares boys and girls dependent on their viewing preferences. This makes two observations in the form $[x_0, \ldots, x_n]$. However, I want to test observations in the form $[[x_0, \ldots, x_n], \ldots, [z_0, …, z_m]]$. I don't know, if "multidimensional" is the right term in this context. I figured it might be.
Could you explain to me, how a chi squared homogeneity test can be performed with multi-row observations? I know it works, because scipy provides unique values for two dimensional input lists.
import scipy.stats as sps
observation1 = [[95, 31, 20], [70, 29, 18]]
observation2 = [[21, 69, 98], [54, 35, 11]]
data = [observation1, observation2]
print sps.chi2_contingency(data)
The code above yields (159.18016188570166, 4.772222443744986e-31, 7, array([[[ 69.44008748, 47.45072645, 42.53205358], [ 45.11526642, 30.82876539, 27.63310068]], [[ 76.04085626, 51.96125177, 46.57502446], [ 49.40378984, 33.75925639, 30.25982128]]]))
where the first value is chi squared. Flattening the observations yields different values, so there must be a difference.
How do you calculate $\chi^2$ for determining the homogeneity of multidimensional observations? Please note that I know how to calculate $\chi^2$ on a multi-row contingency table. Instead I want to know how to perform a for the homogeneity of two contingency tables.
Example:
Table 1 Table 2
outcome0 outcome1 outcome2 sum outcome0 outcome1 outcome2
action0 95 31 20 146 action0 21 69 98
action1 70 29 18 117 action1 54 35 11
sum 165 60 38 263
Question: Are these observations following the same distribution?
Scipy allows to determine the expected values (and $\chi^2 $) of multi-dimensional observations. The expected values are returned as the last element of its call:
Expected table 1 Expected table 2
outcome0 outcome1 outcome2 outcome0 outcome1 outcome2
action0 69.44 47.45 42.53 action0 76.04 51.96 46.58
action1 45.12 30.83 27.63 action1 49.40 33.76 30.26
The sum of the squared differences, divided by the expected value, yields the $\chi^2$ value. Note, however, that the expected tables are not just the expected values from each individual table (quick proof for the first cell of table 1: $\frac{146 \cdot 165}{263} = 91.60$). Instead their calculation seems to be somehow dependent on each other.
I'd like to know how the expected values depend on each other, so that I can calculate it myself.
Best Answer
To analyze a multi-way contingency table, you use log-linear models. In truth, log-linear models are a special case of the Poisson generalized linear model, so you could do that, but log-linear models are more user-friendly. In Python, you may need to use the Poisson GLM, as I gather log-linear models may not be implemented. I will demonstrate the log-linear model using your data with R.
Log-linear models are simply a series of goodness of fit tests. We can start with a (trivial) null model that assumes all cells have the same expected value:
The null is rejected. Next, we can fit a saturated model:
Naturally, this fits perfectly. At this point, we could build up from the null model seeing if additional terms improve the fit, or drop terms from the saturated model to see if the fit gets significantly worse. The latter is more convenient and is conventional. To see if the distribution of
outcomes
byactions
differs as a function of theobservation
, we need to drop the interactions between theobservations
and theactions * outcomes
. If we also drop the marginal effect ofobservations
, we are testing if the mean count differs between the two levels ofobservations
. That may or may not be of interest to you, I don't know.Model 1
has dropped a single degree of freedom fromModel 2
(note that, confusingly,Model 1
$\leftrightarrow$m2
, andModel 2
$\leftrightarrow$m1
), but the decrease in model fit is very small. It is not significant. There is not enough evidence to suggest that the mean counts differ byobservation
. On the other hand, whenModel 2
is compared to theSaturated
model, the decrease in fit is highly significant. The data are inconsistent with the idea that the distribution of counts is the same in both levels ofobservation
.