Solved – Determining statistical significance in election polls from the MoE / confidence interval

binomial distributionconfidence intervalstatistical significance

I am confused about how to determine statistical significance in election polls. I was taught that overlapping confidence intervals do not necessarily imply a statistically insignificant difference between two values. I was Googling around for an answer, and this article confirming it.

However, this article seems to give a different interpretation. In particular, I'm concerned about a simple yes/no poll question (or Candidate A / Candidate B), and not in the case where there is a 3rd option. The claim the second article makes is that, in this simplified case, one can simply add the confidence intervals and if they overlap, then the difference is not statistically significant i.e Candidate A at 53% and Candidate B at 47% with a +/- 3% MOE is not statistically significant.

The two articles seem to be at odds and I would like to understand why. Is it because, in the case of yes/no polling, the two values are perfectly negatively correlated?

Best Answer

Note that the first article you mentioned is about two "independent" populations. In polling, only a "single" population is used, and by so the sample proportion values of candidate A and B are related. While in two independent population tests as suggested in the first article, the sample means and generally the two populations are assumed to be fully independent. Probably that is the reason that comparing the CIs for the yes/no question for a single population (as in presidential voting) works fine.

Related Solutions

Solved – Determining 95% confidence interval of difference for proportion (Stata, or other)

In Stata, this can be done with the immediate versions of prtest. The immediate version obtains data from either group sizes and proportions or group sizes and success counts that are typed as arguments and not from the data stored in memory.

Here are the two ways of doing this:

. prtesti 10 .7 10 .8

Two-sample test of proportions                     x: Number of obs =       10
                                                   y: Number of obs =       10
------------------------------------------------------------------------------
    Variable |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |         .7   .1449138                      .4159742    .9840258
           y |         .8   .1264911                       .552082    1.047918
-------------+----------------------------------------------------------------
        diff |        -.1   .1923538                     -.4770066    .2770066
             |  under Ho:   .1936492    -0.52   0.606
------------------------------------------------------------------------------
        diff = prop(x) - prop(y)                                  z =  -0.5164
    Ho: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.3028         Pr(|Z| > |z|) = 0.6056          Pr(Z > z) = 0.6972

. prtesti 10 7 10 8, count

Two-sample test of proportions                     x: Number of obs =       10
                                                   y: Number of obs =       10
------------------------------------------------------------------------------
    Variable |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |         .7   .1449138                      .4159742    .9840258
           y |         .8   .1264911                       .552082    1.047918
-------------+----------------------------------------------------------------
        diff |        -.1   .1923538                     -.4770066    .2770066
             |  under Ho:   .1936492    -0.52   0.606
------------------------------------------------------------------------------
        diff = prop(x) - prop(y)                                  z =  -0.5164
    Ho: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.3028         Pr(|Z| > |z|) = 0.6056          Pr(Z > z) = 0.6972

In this case, you cannot reject the null that there's no difference in the proportion between the two groups. One caveat is that this is a large sample test, but you have very small sample sizes. Given these proportions, you would need a sample of almost 600 observations (compared to 20) for a two-sided alternative at 5% and equal size groups:

. power twoproportions 0.7 0.8, nratio(1) alpha(.05) test(chi2)

Performing iteration ...

Estimated sample sizes for a two-sample proportions test
Pearson's chi-squared test 
Ho: p2 = p1  versus  Ha: p2 != p1

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =    0.1000  (difference)
           p1 =    0.7000
           p2 =    0.8000

Estimated sample sizes:

            N =       588
  N per group =       294

One alternative would be to use Fisher's exact test, where the null is that the rows do not affect column outcomes, or that the two are independent. Here's how you can do that in Stata, though you have to specify the number of observations in each cell:

. tabi 7 3 \ 8 2, exact

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |         7          3 |        10 
         2 |         8          2 |        10 
-----------+----------------------+----------
     Total |        15          5 |        20 

           Fisher's exact =                 1.000
   1-sided Fisher's exact =                 0.500

The p-values are very large for both the one-sided and two-sided tests, so we still cannot reject the null that the proportions are the same.

R Confidence Interval – How to Find the Difference in Proportions Using Chi-Squared Test

First, let's see if there are differences in the proportion working across the four groups A, B, C, D. (Data similar to yours.)

w = c(90, 32, 9, 3)
nw = c(46, 7 , 8, 5)
TBL = rbind(w, nw)
chisq.test(TBL)

        Pearson's Chi-squared test

data:  TBL
X-squared = 8.7062, df = 3, p-value = 0.03346

Warning message:
In chisq.test(TBL) : 
 Chi-squared approximation may be incorrect

The low cell counts in groups C and D, trigger a warning message, putting the validity of the P-value into doubt. The version of 'chisq.test` implemented in R, allows for simulation of a more accurate P-value, showing a significant effect at the 5% level.

chisq.test(TBL, sim=T)$p.val
[1] 0.03098451

Significance barely at the 5% level does not invite extensive ad hoc tests. To avoid false discovery they should show significance at lower levels. Furthermore, it is not clear just which confidence intervals would be of interest. A look at the Pearson residuals to see if there groups that are strikingly different, possibly suggests comparing groups A and B. However, the level of significance there is unimpressive, especially if we protect against false discovery.

chisq.test(TBL)$resi
         [,1]      [,2]       [,3]      [,4]
w  -0.1173306  1.148334 -0.7081676 -1.019365
nw  0.1671828 -1.636247  1.0090588  1.452480

chisq.test(TBL[,c(1,2)], cor=F)

        Pearson's Chi-squared test

data:  TBL[, c(1, 2)]
X-squared = 3.6176, df = 1, p-value = 0.05717

You have already said you know how to use 'prop.test' to get a 95% confidence interval for the difference of proportions in A and B.

I don't see a point in looking at other pairs of groups---especially not, in view of the low counts there. Maybe you would like to compare group A with the other three groups combined, but 'prop.test' can handle that.

If you had additional kinds of analyses in mind using confidence intervals, please be more specific, and maybe one of us can help.

Best Answer

Related Solutions

Solved – Determining 95% confidence interval of difference for proportion (Stata, or other)

R Confidence Interval – How to Find the Difference in Proportions Using Chi-Squared Test

Related Question