Solved – R power and sample size estimation

rsample-sizestatistical-power

I am tasked with estimating an appropriate sample size for a sales call center experiment. Two groups A & B will be taking calls. Group A (2/3 of the calls) will follow their normal procedure in selling the product. Group B (1/3 of calls) will be selling using a different strategy. I need to estimate how many calls we will need to observe in order to measure a significant difference of 0%, 1%, 5%, 10% in success rates for group A and group B. I have explored the pwr package using pwr.2p2n.test() function, but am not quite sure how to apply for my example.

Total calls per month for both groups will be between 35-50k per month. My thought was to have calls per month and p1 – p2 be variable inputs into pwr.2p2n.test() to get a range of power estimates, then choose the test that maximizes power.

Is this a flawed method?

Best Answer

Given my comments under your post above:

It sounds to be like you are analyzing a 2 x 2 contingency table: Group A vs. Group B x Success vs. Failure. With these, you can easily calculate an odds ratio (OR), see metafor::escalc() for good documentation on getting an OR from a 2 x 2 contingency table.

I have used epiR::epi.ccsize() to do power analyses for odds ratios before in working with epidemiologists. It is geared toward epidemiologists, but the statistics are the same, and the code is very simple.

Let's say we are expecting an odds ratio of 1.5, where there is a 30% success rate in the control group and there is a 2:1 ratio of participants in the control versus experimental group (i.e., what you describe in your post), and we want 95% power:

epi.ccsize(OR=1.50, p0=.30, n=NA, power=.95, r=2)

Which gives us a list:

$n.total
[1] 1578

$n.case
[1] 526

$n.control
[1] 1052

Translating from epidemiologist-centric language, you need 526 experimental and 1052 controls to get 95% power in that situation.


It might also be tempting to try stats::power.prop.test(), but I'm not sure how to handle your 2:1 ratio using that function. For example, this response says that you just need to make sure your smallest group hits the threshold given by power.prop.test(), but I find that that estimate is unnecessarily high:

power.prop.test(p1=.30, p2=.391304, power=.95) # these values for p1 and p2 give OR of 1.50

     Two-sample comparison of proportions power calculation 

              n = 702.1545
             p1 = 0.3
             p2 = 0.391304
      sig.level = 0.05
          power = 0.95
    alternative = two.sided

NOTE: n is number in *each* group

This overestimate jibes well with the comment to the post I linked above, where user Underminer says:

"If you do a 95/5 split, then it'll just take longer to hit the minimum sample size for the variation that is getting the 5%." - while this is a conservative approach to at least satisfying the specified power of the test, you will in actuality be exceeding the specified power entered in power.prop.test if you have one "small" and on "large" group (e.g. n1 = 19746, n2 = 375174). A more exact method of meeting power requirements for unequal sample sizes would likely be desirable

Here's a relevant RPubs link using the pwr package, discussing unequal sample sizes. However, I find the most intuitive way to do this being the way using epiR.

Related Question