Solved – Intuition – Impact of baseline conversion rate on sample size

hypothesis testingsample-size

In an A/B test we calculate the needed sample size before we run the test. The required sample size is dependent on the significance level, the power, the minimum detectable effect (MDE) and the baseline conversion rate. The baseline conversion rate is the percentage of visitors to your website that complete a desired goal that you are observing without changing anything. In other words the conversion rate of the control group.

Let's say we set those value to

  • Significance level: 5 %
  • Power: 80 %
  • Relative MDE: 2 %

And plug them into a sample size calculator.

For different baseline conversion rates we get different sample sizes. The higher the baseline, the lower the sample size.

  • Baseline 10 %: 354,139 per variant
  • Baseline 20 %: 157,328 per variant
  • Baseline 30 %: 91,725 per variant

The relative change we are trying to detect stays the same. I am trying to get an inituition for why we need bigger samples if the baseline is lower.

Best Answer

Thanks to khol for supplying the js function in the comments to the OP, it helps to see what exactly this function is doing, although I think it appears to be the one for the absolute change rather than relative unless delta is pre-processed before this function. As an intuitive answer was requested I will not delve too deeply unless requested. I have however provided an explanation of what the function is doing if you want to go that far.

Intuitive Explanation

It might be expected that since higher conversion rates are associated with a higher variance it would require more samples to counter the extra variance. But this is only one side of the calculation.

At 10% background rate and a relative 2% change, your signal you want to detect is 0.47% of the expected variance (noise) under null. At 30% background rate it is 0.93% of the expected variance under null, approximately twice the signal to noise ratio. Since noise scales by the square root of N, this means you would expect a 10% background rate to require about 4x the numbers required for 30% if you want to maintain the same confidence in the result. You will observe this approximates you figures well.

So the intuitive explanation is that a 2% relative change at 10% background rate is much smaller relative to the noise than it would be at 30% background rate. Because it has a smaller signal to noise value you need larger numbers to average out your noise and attain the same level of confidence in your result.

Explanation of the calculation based on code

num_subjects(alpha, power_level, p, delta) 
{ if (p > 0.5) { p = 1.0 - p; } 
      var t_alpha2 = ppnd(1.0-alpha/2); 
      var t_beta = ppnd(power_level); 
      var sd1 = Math.sqrt(2 * p * (1.0 - p)); 
      var sd2 = Math.sqrt(p * (1.0 - p) + (p + delta) * (1.0 - p - delta)); 
      return (t_alpha2 * sd1 + t_beta * sd2) * (t_alpha2 * sd1 + t_beta * sd2) / (delta * delta); }

p appears to be the background conversion rate. Line 1 simply inverses the probability if it is >0.5 since it is simply the mirror image (the direction of change is not important).

The second line var t_alpha2 calculates the variance associated with the specified significance level (alpha)

The third line var t_beta is the variance associated with the specified power (power_level)

The fourth line var sd1 is the variance associated with the background rate. Since backround rate (p) is involved this will change as p changes. The closer to 0.5 p is the bigger the resulting sd1. For your values of background rate this will give sd1 = 0.424264069, 0.565685425 and 0.64807407 respectively.

The fifth line var sd2 is the variance associated with the relative change in background rate (i.e. the background minus and plus the relative change). With relative change the absolute value of change is smaller for smaller background rates. So 2% of 10% is 0.2%, for 20% it is 0.4 and for 30% it is 0.6%. This gives values for sd2 of 0.426140822, 0.567788693 and 0.649895376 respectively.

This means the last line returns the square of the sum of the products of (background rate and significance) and (relative change and power) divided by the square of the relative change.

The background rate is your null hypothesis which is why its variance is multiplied with the variance associated with your significance (which is conditional on no change).

The relative change is your alternative hypothesis, which is why its variance is multiplied by the variance associated with your power (which is conditional on the expected change).

All in all this means that if you make the denominator (delta squared) really small then the result will be a big number. Since the relative change squared is 9 times higher (0.000036 compared to 0.000004) at baseline 30 vs 10 %, while the variances SD1 and SD2 only change by a factor of 1.527525232 and 1.525071861 respectively.

Related Question