What test to use to find which proportion is highest in multiple groups of different sample sizes

chi-squared-testproportion;

I have 4 samples of different sizes. I have used chi-square test to find whether the male-female proportions are same in these samples and it came out statistically significantly different (p < 0.01). From the percentage value of male-females proportion in the samples, I can see that % of males are higher in Sample A than in other groups. Is there any test to make the statement Males are higher in Sample A compared to other groups (p-value ???)

Best Answer

After the significant chi-squared test on the four groups, you may want to do ad hoc tests comparing Group A with other groups. Because you did not show your actual data, I will show how to do this for fictitious data, which may be somewhat similar to yours.


Suppose you have a table of numbers of sales as follows:

     A    B    C    D
    ------------------  
 M  201  145  143  130
 F  170  152  148  211

In R:

m = c(201, 145, 143, 130)
f = c(170, 152, 148, 211)
TBL = rbind(m, f);  TBL

TBL
  [,1] [,2] [,3] [,4]
m  201  145  143  130
f  170  152  148  211

In R, a chi-squared test of homogeneity rejects the null hypothesis that sales of gender are homogeneous across groups at significance level $0.1%$ with P-value $0.0003 < 0.001 = 0.1\%.$ The Yates continuity correction is declined (parameter 'cor=F') on account of reasonably large counts.

chisq.test(TBL, cor=F)

        Pearson's Chi-squared test

data:  TBL
X-squared = 19.168, df = 3, p-value = 0.0002523

The chi-squared test compares the observed counts in TBL with counts (based on marginal totals) that would be expected under the null hypothesis of homogeneity. The Pearson residuals can show where disagreement of observed and expected counts is greatest; look especially for residuals with largest absolute values. Here it seems that the greatest contribution to the relatively large chi-squared statistic comes from groups A and D.

chisq.test(TBL, cor=F)$resi
       [,1]       [,2]      [,3]      [,4]
m  1.831823  0.3012389  0.377127 -2.540219
f -1.746446 -0.2871989 -0.359550  2.421826

You can do an ad hoc test to compare these two groups by selecting only columns 1 and 4 of TBL. In order to avoid 'false discovery' from repeated analyses on the same data, ad hoc tests should be conducted at a smaller significance level than the main chi-squared test. Here it is clear that Groups A and D differ. Specifically, numbers of sales by males are larger in A and sales by females are larger in D.

chisq.test(TBL[,c(1,4)], cor=F)

        Pearson's Chi-squared test

data:  TBL[, c(1, 4)]
X-squared = 18.41, df = 1, p-value = 1.781e-05

+Note:_ Another, essentially equivalent, version of the chi-squared test in R is 'prop.test' as follows: It shows proportions of sales by makes in each group. The proportions $0.542$ and $0.381$ were shown to be significantly different by the ad hoc chi-squared test above.

t = m+f
prop.test(m, t, cor=F)

        4-sample test for 
        equality of proportions 
        without continuity correction

data:  m out of t
X-squared = 19.168, df = 3, p-value = 0.0002523
alternative hypothesis: two.sided
sample estimates:
    prop 1    prop 2    prop 3    prop 4 
 0.5417790 0.4882155 0.4914089 0.3812317