Sample-Size – Determining Sample Size Proportion for Control vs. Experiment Group

hypothesis testingsample-sizesamplingstatistical significance

I am taking an online statistics course and I understand how to calculate the necessary sample size for a hypothesis test.

I am using an online calculator like http://www.evanmiller.org/ab-testing/sample-size.html or python like this https://stackoverflow.com/questions/15204070/is-there-a-python-scipy-function-to-determine-parameters-needed-to-obtain-a-ta

From what I understand, this gives me the minimum sample size for each group – control and treatment.

However, if I am designing a test and I have a total sample size of 30,000; how do I calculate how large the control vs. the treatment group should be.

I understand that the treatment group needs to be the minimum sample size I calculated before and I am reading that generally the 50/50 split leads to the highest statistical power, but how can I show this with a calculation. I have been googling it unsuccessfully, so even a link to the correct approach would be greatly appreciated.

This is the closest I found https://janhove.github.io/design/2015/11/02/unequal-sample-sized, but I wasn't able to extract the correct formula.

I found this helpful cross-validated answer Is a large control sample better than a balanced sample size when the treatment group is small? ; but I am still unsure how to calculate the best ratio between control and treatment group if I have a given total sample size. (or how to prove that the 50/50 split has the highest statistical power)

I also found this great answer Treatment and Control group, the sample size, but it applies to a different industry. The hypothesis test I am designing is in the industry of online user behavior psychology.

Thank you very much in advance for any hint in the right direction (even just the correct terminology I can search for).

Best Answer

First of all, your formula for necessary sample size looks suspicious, the part of the formula StdDev*(1-StdDev) doesn't make much sense, perhaps it's supposed to be proportion*(1-proportion) for cases when you have a binomial distribution with a sample proportion of successes.

But that formula is an aside from your main question: why does a 50/50 split of samples produce the highest power?

The hypothesis you are trying to test is that the mean of the experiment group $\mu_E$ is the same mean as the mean of the control $\mu_C$. Essentially you are testing if $\mu_E - \mu_C = 0$.

Suppose that the true variance (not sample variance) of the experiment group is $\sigma^2_E$ and that you have a sample size $n_E$. Likewise the control group variance and sample size is $\sigma^2_C$ and $n_C$.

From your samples you will be examining $\bar{X_E} - \bar{X_C}$ to test the hypothesis $\mu_E - \mu_C = 0$. For an unbiased sample the variance of the sample mean $\bar{X_E}$ is expected to be around $\frac{\sigma^2_E}{n_E}$. Likewise for the control group the variance of the sample mean is $\frac{\sigma^2_C}{n_C}$.

When you subtract one variable from another the resulting variable has a variance equal to the sum of the two variances. Therefore the variance of $\bar{X_E} - \bar{X_C}$ is $\frac{\sigma^2_E}{n_E} + \frac{\sigma^2_C}{n_C}$

Since you have no apriori reason to suspect that the variance of the control or the experiment group is larger we will just assume that they are equal. Therefore we assume $\sigma^2_E=\sigma^2_C=\sigma^2$, and the variance of $\bar{X_E} - \bar{X_C}$ is now $\frac{\sigma^2}{n_E} + \frac{\sigma^2}{n_C} = \sigma^2\left(\frac{1}{n_E} + \frac{1}{n_C}\right)$

To get the most powerful test we want to minimize the variance. If you have a total number of samples $N$ and a proportion $p$ of them are in the experiment group then $n_E=Np$ and $N_C=N(1-p)$.

The variance is $\sigma^2\left(\frac{1}{Np} + \frac{1}{N(1-p)}\right)= \frac{\sigma^2}{N} \left(\frac{1}{p} + \frac{1}{(1-p)}\right)$. You can see by plotting a graph of $\left( \frac{1}{p} + \frac{1}{(1-p)}\right)$ that the minimum occurs at $p=0.5$, alternatively you can use calculus to prove this minimum more rigorously.

Related Question