What test to use to find which proportion is highest in multiple groups of different sample sizes

chi-squared-testproportion;

I have 4 samples of different sizes. I have used chi-square test to find whether the male-female proportions are same in these samples and it came out statistically significantly different (p < 0.01). From the percentage value of male-females proportion in the samples, I can see that % of males are higher in Sample A than in other groups. Is there any test to make the statement Males are higher in Sample A compared to other groups (p-value ???)

Best Answer

After the significant chi-squared test on the four groups, you may want to do ad hoc tests comparing Group A with other groups. Because you did not show your actual data, I will show how to do this for fictitious data, which may be somewhat similar to yours.

Suppose you have a table of numbers of sales as follows:

     A    B    C    D
    ------------------  
 M  201  145  143  130
 F  170  152  148  211

In R:

m = c(201, 145, 143, 130)
f = c(170, 152, 148, 211)
TBL = rbind(m, f);  TBL

TBL
  [,1] [,2] [,3] [,4]
m  201  145  143  130
f  170  152  148  211

In R, a chi-squared test of homogeneity rejects the null hypothesis that sales of gender are homogeneous across groups at significance level $0.1%$ with P-value $0.0003 < 0.001 = 0.1\%.$ The Yates continuity correction is declined (parameter 'cor=F') on account of reasonably large counts.

chisq.test(TBL, cor=F)

        Pearson's Chi-squared test

data:  TBL
X-squared = 19.168, df = 3, p-value = 0.0002523

The chi-squared test compares the observed counts in TBL with counts (based on marginal totals) that would be expected under the null hypothesis of homogeneity. The Pearson residuals can show where disagreement of observed and expected counts is greatest; look especially for residuals with largest absolute values. Here it seems that the greatest contribution to the relatively large chi-squared statistic comes from groups A and D.

chisq.test(TBL, cor=F)$resi
       [,1]       [,2]      [,3]      [,4]
m  1.831823  0.3012389  0.377127 -2.540219
f -1.746446 -0.2871989 -0.359550  2.421826

You can do an ad hoc test to compare these two groups by selecting only columns 1 and 4 of TBL. In order to avoid 'false discovery' from repeated analyses on the same data, ad hoc tests should be conducted at a smaller significance level than the main chi-squared test. Here it is clear that Groups A and D differ. Specifically, numbers of sales by males are larger in A and sales by females are larger in D.

chisq.test(TBL[,c(1,4)], cor=F)

        Pearson's Chi-squared test

data:  TBL[, c(1, 4)]
X-squared = 18.41, df = 1, p-value = 1.781e-05

+Note:_ Another, essentially equivalent, version of the chi-squared test in R is 'prop.test' as follows: It shows proportions of sales by makes in each group. The proportions $0.542$ and $0.381$ were shown to be significantly different by the ad hoc chi-squared test above.

t = m+f
prop.test(m, t, cor=F)

        4-sample test for 
        equality of proportions 
        without continuity correction

data:  m out of t
X-squared = 19.168, df = 3, p-value = 0.0002523
alternative hypothesis: two.sided
sample estimates:
    prop 1    prop 2    prop 3    prop 4 
 0.5417790 0.4882155 0.4914089 0.3812317

Related Solutions

Solved – Transforming data for chi square — squaring negative value difference scores

You say you did paired t-tests on the original data, before dichotomizing it, and that males increased significantly from the old form to the new but the female change was not significant. Unfortunately, that can not be taken as showing that the male change was bigger than the female change. You need to do an independent-groups t-test on the two sets of change scores. (Better yet, you could replace all the t-tests by confidence intervals for the corresponding means and mean differences, which would give you more information.)

For the dichotomized data, the situation is similar.
You have two contingency table, one for males and one for females.

Males      
        Yes   No     Total
  Yes   Myy   Myn    My.
   No   Mny   Mnn    Mn.

Total   M.y   M.n    M.. = M = total number of Males

Females      
        Yes   No     Total
  Yes   Fyy   Fyn    Fy.
   No   Fny   Fnn    Fn.

Total   F.y   F.n    F.. = F = total number of Females

For each table, the analog of the paired t-test is the McNemar test,
http://en.wikipedia.org/wiki/McNemar%27s_test

I know of no simple standard test of the difference between the changes in endorsement rates, but if all of Myn, Mny, Myy+Mnn, Fyn, Fny, Fyy+Fnn are "large" then an asymptotic test might be justified.

Solved – Chi square test when sample sizes are different

You can use a chi-squared test in your example with different sample sizes. Your "another verb type" would be verbs that are not oral verbs, i.e. all the other verbs

Suppose in your example, $10$ of the $82$ verbs in sample one were oral verbs and $72$ were not, while $20$ of the $89$ verbs in sample two were oral verbs and $69$ were not. Then the table for your four cell chi-squared test could look like

10  72  |  82
20  69  |  89
__ ___    ___
        |
30 141  | 171

and in R you might get

chisq.test(rbind(c(10, 72), c(20, 69)))

#     Pearson's Chi-squared test with Yates' continuity correction
#
# data:  rbind(c(10, 72), c(20, 69))
# X-squared = 2.4459, df = 1, p-value = 0.1178

so this example would not be statistically significant

Best Answer

Related Solutions

Solved – Transforming data for chi square — squaring negative value difference scores

Solved – Chi square test when sample sizes are different

Related Question