Confounding – Understanding Simpson’s Paradox and Confounding

confoundingsimpsons-paradox

Consider a scenario where a two-way contingency table is analyzed by a chi-squared test of independence and a significant result is found. Now, it turns out that this table is an aggregation of data from two subgroups which are heterogenous and one much bigger than the other. When analysed at the sub-group level, both groups give a non-significant result. How is this best explained ?

Edit: I managed to find the paper where I read about this. They explain it as Simpson's Paradox.
http://www.amstat.org/publications/jse/secure/v7n3/datasets.morrell.cfm

Best Answer

Simpson's paradox is an extreme form of confounding where the apparent sign of correlation is reversed; you haven't said this is the position here.

I can see at least three possibilities here: the heterogenity between the subgroups, the reduction in sample sizes in each, and poor definition of the subgroups which presuppose the results. Ignoring the third, both of the first two can have an impact: from past experience it is often the small sample size which lead to non-significance in the smaller subgroup and heterogenity which causes the whole group to produce a significant result wile the large subgroup does not.

That was an over-generalisation - each case will have its own issues.