Monte Carlo Simulation – Power Analysis When Effect Size, Means, and Std Are Unknown

monte carlostatistical-power

Say you want to execute an experiment to see whether a treatment is better than a control and want to properly power your experiment. However, you have no beliefs a prior regarding the treatment's efficacy, so you don't know what the effect (difference of means) could be.

Is there a Monte Carlo method means to determine the minimal sample size if you don't know how big an effect will be? (And if so, is there a python implementation? Though this is less important)

Edit: The desired output might be a matrix where the minimum effect size and minimum sample size are mapped together (ie 0.01 effect -> 200 sample size)

Best Answer

I agree with @AndyW that a key consideration in a power and sample size determination is to decide how big a difference between two means would be a result of practical importance.

A more difficult ingredient in such a determination is to estimate the variances of the responses. Maybe you can get a clue about the variances by looking at prior studies using similar methods.

You also need to know the significance level of the test you will use: often 5%, sometimes in medical studies 1% or smaller. And you need to have a good idea what power the test needs to have. Often people want at least 80% or 90% probability of detecting an effect of the desired size, if real.

Do not be dismayed that some of the necessary inputs into a power and sample size determination are inevitably guesses. Such a determination based on reasonable guesses is almost always much better than none at all.

Suppose you want 85% power for a two-sample t test at the 5% level to detect a difference in means that is 5 units, when the standard deviation of the observations may be 10 units. Many statistical software programs have 'power and sample size' procedures for balanced studies (equal sample sizes in the two groups).

Below is printout from a recent release of Minitab that includes the situation I described in the previous paragraph.

Power and Sample Size 

2-Sample t Test

Testing mean 1 = mean 2 (versus ≠)
Calculating power for mean 1 = mean 2 + difference
α = 0.05  Assumed standard deviation = 10


            Sample  Target
Difference    Size   Power  Actual Power
         5      64    0.80      0.801460
         5      73    0.85      0.850968
         5      86    0.90      0.903230

enter image description here

Power and sample size procedures are available in many statistical software programs for many of the most common procedures.

Some procedures require simulation. A two-sample t test would require simulation if you know ahead of time that one of the two groups has a larger variance so that you'll need to do a Welch t test, you may need simulation for that. Also, most software assumes equal sample sizes in the two groups. If financial constraints require one group to be smaller than the other, then you'd probably need simulation.

The simulation in R below addresses the situation where one of the two groups in a a two-sample t test has SD $9$ and the other has SD $11.$ With $70$ observations in each group the power is about 83% even using a Welch test to accommodate to slightly different group standard deviations.

set.seed(2022)
pv = replicate(10^5, 
       t.test(rnorm(70,50,9),
              rnorm(70,55,11))$p.val)
mean(pv <= 0.05)
[1] 0.83177      # aprx power