Solved – Determining 95% confidence interval of difference for proportion (Stata, or other)

confidence intervalstatasurvey

I have a dataset that is looking at completeness of data entry in a survey in two populations. Variables represent each category and are allocated 1 (present) or 0 (not present). I was wondering how to determine, if it is actually possible, a 95% confidence interval of difference for the proportion of completeness. This would help determine whether the difference in completeness (proportion) is statistically significant across the populations.

i.e. The first population for category 1 has a 70% completeness (seven presents and 3 not presents); the second population has 80% completeness (eight presents and 2 not presents). The difference in proportion is 10% (80-70), but what is the confidence interval (if available)?

Best Answer

In Stata, this can be done with the immediate versions of prtest. The immediate version obtains data from either group sizes and proportions or group sizes and success counts that are typed as arguments and not from the data stored in memory.

Here are the two ways of doing this:

. prtesti 10 .7 10 .8

Two-sample test of proportions                     x: Number of obs =       10
                                                   y: Number of obs =       10
------------------------------------------------------------------------------
    Variable |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |         .7   .1449138                      .4159742    .9840258
           y |         .8   .1264911                       .552082    1.047918
-------------+----------------------------------------------------------------
        diff |        -.1   .1923538                     -.4770066    .2770066
             |  under Ho:   .1936492    -0.52   0.606
------------------------------------------------------------------------------
        diff = prop(x) - prop(y)                                  z =  -0.5164
    Ho: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.3028         Pr(|Z| > |z|) = 0.6056          Pr(Z > z) = 0.6972

. prtesti 10 7 10 8, count

Two-sample test of proportions                     x: Number of obs =       10
                                                   y: Number of obs =       10
------------------------------------------------------------------------------
    Variable |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |         .7   .1449138                      .4159742    .9840258
           y |         .8   .1264911                       .552082    1.047918
-------------+----------------------------------------------------------------
        diff |        -.1   .1923538                     -.4770066    .2770066
             |  under Ho:   .1936492    -0.52   0.606
------------------------------------------------------------------------------
        diff = prop(x) - prop(y)                                  z =  -0.5164
    Ho: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.3028         Pr(|Z| > |z|) = 0.6056          Pr(Z > z) = 0.6972

In this case, you cannot reject the null that there's no difference in the proportion between the two groups. One caveat is that this is a large sample test, but you have very small sample sizes. Given these proportions, you would need a sample of almost 600 observations (compared to 20) for a two-sided alternative at 5% and equal size groups:

. power twoproportions 0.7 0.8, nratio(1) alpha(.05) test(chi2)

Performing iteration ...

Estimated sample sizes for a two-sample proportions test
Pearson's chi-squared test 
Ho: p2 = p1  versus  Ha: p2 != p1

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =    0.1000  (difference)
           p1 =    0.7000
           p2 =    0.8000

Estimated sample sizes:

            N =       588
  N per group =       294

One alternative would be to use Fisher's exact test, where the null is that the rows do not affect column outcomes, or that the two are independent. Here's how you can do that in Stata, though you have to specify the number of observations in each cell:

. tabi 7 3 \ 8 2, exact

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |         7          3 |        10 
         2 |         8          2 |        10 
-----------+----------------------+----------
     Total |        15          5 |        20 

           Fisher's exact =                 1.000
   1-sided Fisher's exact =                 0.500

The p-values are very large for both the one-sided and two-sided tests, so we still cannot reject the null that the proportions are the same.

Related Question