# Statistical Comparison – Comparing Percentages from Two Groups with Different Sample Sizes

mathematical-statistics

I'm trying to compare the increase in percentage of $$X$$ variable in 2000 and in 2012, so I calculated the percentage of the $$X$$ variable in 2000 and in 2012 and subtracted the difference. My question is if the sample size is different is it still correct to compare the percentage? For example:

Variable X Count in 2000 Count in 2012
A 89 114
B 9 33
Total sample size 98 147

So is it correct to say that there is an increase in $$A$$ even though the sample size of 2012 is higher?

Edits:
The 2 data sets are for 2 different cohorts, one before 2000 and other in 2012. So the numbers are different because it represents all the patients at those periods. So I'm comparing the change of the percentage by counting how many of $$A$$ at the variable $$X$$ in 2000 and divide it by the total amount of the variable $$X$$ and times it by 100 to get the percentage. The I do the same thing for $$A$$ in 2012 to get the percentage and then subtract the difference. Note: Variable $$X$$ is categorical variable (for example: $$X$$ = A,A,A,A,A,B,A,T,A,B).

% of A in 2000 = Counts of A in 2000/ total counts of variable X in 2000 (A +B)*100
% of A in 2012 = Counts of A in 2012/ total counts of variable X in 2012 (A +B)*100 (A +B)*100



My concern is the difference in percentage that I'm getting is related to real increase in $$A$$ in 2012 or it is just because of the data set in 2012 is larger? How can I correct for this difference? Can I do a proportion test to check if this increase is real? The null hypothesis for the proportion test would be:
Null= There is no increase in $$A$$
Alternative= There is an increase in $$A$$.
Is it correct to use the proportion test to check if the difference is real and not related to the differences in sample size?

Please,any help would be very appreciated.

It also seems appropriate to use prop.test in R, as below. The proportion of A's in the two years are about $$0.978$$ and $$0.776,$$ which are judged to be significantly different proportions because the P-value is near $$0.$$

prop.test(c(89,114), c(91, 147))

2-sample test for equality of proportions
with continuity correction

data:  c(89, 114) out of c(91, 147)
X-squared = 16.798, df = 1, p-value = 4.158e-05
alternative hypothesis: two.sided
95 percent confidence interval:
0.1197452 0.2852783
sample estimates:
prop 1    prop 2
0.9780220 0.7755102


This is essentially the same as a chi-squared test of homogeneity of the $$2\times 2$$ table of counts with rows for A and B and columns for 2000 and 2012. [On account of the moderately large sample sizes, one might omit the continuity correction, with parameter cor=F in prop.test and in chisq.test on the $$2 \times 2$$ table mentioned just above.]