I have two groups (group 1: $n=100$, and group 2: $n=200$), and multiple proportions for each group (where the number represents the proportion of individuals in the group with each disease). Example:
group1 group2
high cholesterol 0.20 0.28
high blood pressure 0.18 0.16
cardiovascular disease 0.13 0.20
diabetes 0.25 0.20
vitamin d deficiency 0.05 0.15
I want to calculate whether there is a significant difference between the two groups, overall, across the disease categories. Since this data is not a contingency table, I clearly cannot use a chi-squared test or a Fisher's Exact Test. I know how to compare single proportions across two groups, but is there a way to compare multiple proportions across the two groups simultaneously and get a single p-value? Of course, I could test each disease individually with a two-proportion z-test and then adjust for multiple comparisons, but can I test everything at once (in the flavor of a Fisher Exact Test)?
UPDATE: The categories are not disjoint as an individual can have 0 diseases or $>1$ diseases (seen by the fact that the proportions for either group do not add up to 1), which is why we cannot use the usual strategies that are used for a contingency table. In essence, instead of comparing the proportions $A$ and $B$ across two groups, I am trying to compare the vectors of proportions, $A = [a_1, a_2, a_3, a_4, a_5]$ and $B = [b_1, b_2, b_3, b_4, b_5]$ across two groups.
Best Answer
Continuing from my comment: On the assumption that disease categories are mutually exclusive, and using an additional category
None
so that groups total $n_1 = 100, n_2 = 200,$ as stated, here is a chi-squared test of homogeneity (in R) of disease category across groups.The null hypothesis of homogeneity is rejected (P-value $0.0023).$
Observed counts $X_{ij}$ echo the input, expected counts $E_{ij}$ are based on row and column totals of the table (assuming homogeneity). For example, $E_{11} = 100(76/300) = 25.33333.$
The chi-squared statistic (
X-squared
in output) is $$ Q = \sum_{i=1}^2\sum_{j=1}^6 \frac{(X_{ij}-E_{ij})^2}{E_{ij}}=18.593,$$ which is distributed approximately as $\mathsf{Chisq}(\nu),$ where the number of degrees of freedom is $\nu = (2-1)(6-1) = 5.$ The P-value is the probability $0.0023$ under the density curve of $\mathsf{Chisq}(5)$ to the right of $18.593.$In order for $Q$ to have this chi-squared distribution the $E_{ij}$s should exceed $5,$ which is true for your data.
The Pearson residuals are the square roots of the the $rc = 12$ contributions $C_{ij} = \frac{(X_{ij}-E_{ij})^2}{E_{ij}},$ given the signs of the differences $D_{ij} = X_{ij}-E_{ij}.$
Residuals with the largest absolute values point the way to the contributions most responsible for a large enough value $Q$ to lead to rejection. Here the key residuals are for the category
None
, so number of G1 subjects not having one of the five diseases is larger than expected if categories were homogeneous across groups. Otherwise, disease categories 1 and 5 seem different among the groups.Separate ad hoc tests (perhaps at the 1% level to avoid 'false discovery' according to the Bonferroni method), would show which differences are significant.