Solved – Multiple testing correction for chi squared residuals

chi-squared-testclusteringmultiple-comparisonsresiduals

I am running cluster analysis (using mclust in R) and then looking to see whether various known data groupings (based on metadata) reveals associations between clusters and metadata groups. For each cluster/groupings table I am using the chi squared test, and since there are multiple ways to group the data I am simply using a Bonferroni correction on the resulting p-values.

I've had a hard time finding good references for how to interpret the individual associations, but my understanding currently is that I can determine the significance of a given association by converting the standardized residual of each cell to a p-value using the normal distribution (p = 2*pnorm(-abs(stdres))). Is this correct, and further is it proper to not adjust these values based on the size of the given table? I've seen nothing to suggest I should perform such a correction, so I'm assuming it is unnecessary.

If so, then given the fact that I am testing multiple tables, should I be simply multiplying each cell's p-value by the number of tables to achieve the right multiple testing correction?

Best Answer

This looks correct to me but I am not so sure about your view that no adjustment is necessary. Obviously, if you examine many cells, you will have an inflated type I error rate. I don't see any particular reason why this would not matter at all and we could happily ignore multiple testing issues in this particular setting but not in others.

At the same time, a Bonferroni correction would be very conservative and since we only look at residuals after rejecting the null hypothesis of independence over the whole table, we already know that there must be some pattern of association somewhere.

One way out of this dilemma is to give up rigid binary interpretations but if you insist on formal testing then I would think that you also need to adjust those $p$-values for the size of the table.

Agresti (Categorical Data Analysis) discusses standardized residuals as a follow-up to the chi-squared test of independence in two-way contingency tables on pp. 80-81 and has this to say:

A standardized Pearson residual that exceeds about 2 or 3 in absolute value indicates lack of fit of $H_0$ in that cell. Larger values are more relevant when df is larger and it becomes more likely that at least one is large simply by chance.

I take this to mean that, strictly speaking, adjusting these $p$-values would in fact be necessary (as intuition would suggest) but that he prefers an informal approach to the problem (in a standard normal deviate, 2 is roughly $p$ = .05, two-tailed, but he clearly implies that this particular threshold should not be interpreted too literally).

Related Question