Chi-Square Test – How to Perform Chi-Square Test When Respondents May Be Assigned to Multiple Categories

chi-squared-testcontingency tableshypothesis testing

I am referring to this example: http://www.sthda.com/english/wiki/chi-square-test-of-independence-in-r

The data used are frequencies for household tasks:

Wife Alternating Husband Jointly
Laundry 156 14 2 4
Main_meal 124 20 5 4
Dinner 77 11 7 13
Breakfast 82 36 15 7
Tidying 53 11 1 57
Dishes 32 24 4 53
Shopping 33 23 9 55
Official 12 46 23 15
Driving 10 51 75 3
Finances 13 13 21 66
Insurance 8 1 53 77
Repairs 0 3 160 2
Holidays 0 1 6 153

My questions:

  1. Normally, in a contingency table for chi square, the categories (rows) should be independent. In this data set, I am not sure if the surveyed households were assigned to more rows. In such a case the chi square would be not appropriate or?

  2. Considering each household was just asked about one activity (row), would it then be ok to use the chi square test like in the example?

Best Answer

To give some context, the table comes from p.381 of "Nonsymmetric Correspondence Analysis: A Tool for Analysing Contingency Tables With a Dependence Structure" by Kroonenberg & Lombardo (1999) https://doi.org/10.1207/S15327906MBR3403_4 (in fact they adapted data collected by other researchers in the 1970s in Germany, their article explains all of this in detail). The article is also freely available here for download, the part about the household tasks study is pp.379-384.

223 young, childless married couples were asked to answer who performs primarily a given task in the household, for each task. So the same household is theoretically counted in each row, which indeed would be problematic for conducting a chi-square test on this dataset (see Is it okay to run a chi square if each participant is contributing multiple counts?). But anyway, as Peter Flom says in his answer, it would be redundant to conduct a chi-square test in the first place, given that there are very obvious differences between rows.

You may have noticed that the sum of each row does not add up to 223, and is different between each row. It's due to responses being excluded from the table when the husband and the wife disagreed on who performs the task. In their article, Kroonenberg and Lombardo discuss why a "disagreement" column was not included in the table to take into account such cases.

The core of their article also discusses a possible method to analyze this kind of data using a variant of correspondence analysis (nonsymmetric correspondence analysis), but using a regression as suggested by Peter Flom in his answer may be a good option depending on the question you want to answer ultimately. Note that in their analysis, Kroonenberg and Lombardo treated "tasks" as predictor variable, and "who performed the task" as the response variable.


As a side note, it looks like that the author of the webpage you link to in your question is unaware of the original source of the data and its study design, as they don't mention it. It probably explains why they thought a chi-square test was suitable for this table.

Related Question