I have a dataset that is looking at completeness of data entry in a survey in two populations. Variables represent each category and are allocated 1 (present) or 0 (not present). I was wondering how to determine, if it is actually possible, a 95% confidence interval of difference for the proportion of completeness. This would help determine whether the difference in completeness (proportion) is statistically significant across the populations.
i.e. The first population for category 1 has a 70% completeness (seven presents and 3 not presents); the second population has 80% completeness (eight presents and 2 not presents). The difference in proportion is 10% (80-70), but what is the confidence interval (if available)?
Best Answer
In Stata, this can be done with the immediate versions of
prtest
. The immediate version obtains data from either group sizes and proportions or group sizes and success counts that are typed as arguments and not from the data stored in memory.Here are the two ways of doing this:
In this case, you cannot reject the null that there's no difference in the proportion between the two groups. One caveat is that this is a large sample test, but you have very small sample sizes. Given these proportions, you would need a sample of almost 600 observations (compared to 20) for a two-sided alternative at 5% and equal size groups:
One alternative would be to use Fisher's exact test, where the null is that the rows do not affect column outcomes, or that the two are independent. Here's how you can do that in Stata, though you have to specify the number of observations in each cell:
The p-values are very large for both the one-sided and two-sided tests, so we still cannot reject the null that the proportions are the same.