Statistical Significance – How to Calculate Chi-Squared Values for Percentages

chi-squared-testscipystatistical significance

I'm using scipy and I'd like to calculate the chi-squared value of a contingency table of percentages.

This is my table, it's of relapse rates. I'd like to know if there are values that are unexpected, i.e. groups where relapse rates are particularly high:

        18-25    25-34    35-44    ...
Men     37%      36%      64%      ...
Women   24%      25%      32%      ...

The underlying data looks like this:

        18-25           25-34           35-44          ...
Men     667 of 1802     759 of 2108     1073 of 1677   ...

Should I just use the raw values, so have a contingency table like this, and run a chi-squared test on the raw values?

        18-25        25-34        35-44      ...
Men     667          759          1073       ...

That doesn't seem quite right, because it doesn't capture the relative underlying size of each group.

I have been Googling, but haven't been able to find an explanation I understand of what I should do. How should I find unexpected values in data like this?

Best Answer

As long as the percentages all add to 100 ((not the case in your illustration) and reflect mutually exclusive and exhaustive outcomes (not the case either), you can compute $X^2$ using the percentages, and multiply it by $N/100$.

In your case, you really have a 3-way table. It appears that what you'd really like to know is how age and sex affect relapse rates. So I think you're better off forgetting the chi-square stuff, and instead using the actual frequencies for each cell:

relapse     n    Age   Sex
    667  1802  18-25     M
    759  2108  25-34     M
    ...

Then run a logistic regression model with Age, Sex, and Age:Sex as the predictors. You can then see what the effects of those factors are, do comparisons among predictions, etc. It'd be a lot more informative than a chi-square statsitic of some independence hypothesis.