Short answer is that I don't think you can. Pandas doesn't provide anything like the nstart option or many of the other options that SAS provides. It's not entirely clear to me without looking at the source what exactly pandas is doing here since the docs don't follow the literature I'm familiar with.
You'll find more fully featured exponential smoothing in this PR for statsmodels. We still don't have anything like an nstart
parameter. We estimate the starting values heuristically from suggestions in the literature, but you could compute them yourself and pass them in. It's not done yet. I'm still implementing the optimization for starting values and parameters.
https://github.com/statsmodels/statsmodels/pull/1489
The problems com from how functions are defined.
The implementation of the power functions in statsmodels including FTestAnovaPower initially followed the design of G*Power package and R package pwr
.
Effect size for anova is defined as Cohen's f
. Also nobs
are the total number of observations.
The functions for oneway
anova use in general squared Cohen's f effect size.
The default is based on unequal variance assumption as in Welch ANOVA. Alternative variance assumptions are equal variance and an approximation for Browne-Forsythe (1971) mean ANOVA.
Following the examples in https://www.statsmodels.org/dev/generated/statsmodels.stats.oneway.effectsize_oneway.html , we can replicate the R results with 3 changes:
- use_var="equal" in effect size computation
- use square root of returned effect size to get f instead of f-squared
- use total nobs in power computation
then
ese = effectsize_oneway(means = assumption.means,
vars_ = assumption.variances,
nobs = assumption.n, use_var="equal")
ese, np.sqrt(ese)
(0.04272062956717256, 0.20668969390652395)
FTestAnovaPower().power(effect_size=np.sqrt(ese), nobs=3*26,
alpha=0.05, k_groups=3)
0.34077463829487
Welch, unequal variance, ANOVA
To illustrate power for Welsh ANOVA that BruceET used in his answer, we can compute power with both variance assumptions. BruceET obtained simulated power of 0.80204.
In the following es
is effect size under unequal variance assumptions for Welsh ANOVA. The resulting power for 70 observations per group is 0.799, very close to the simulated power.
The second computation uses effect size ese
under equal variance assumption intended for standard equal variance ANOVA. The computed power is lower at 0.76 and underestimates the Welch ANOVA power in this case.
FTestAnovaPower().power(effect_size = np.sqrt(es), nobs = 3*70,
alpha=0.05, k_groups=3)
0.7994674969056634
FTestAnovaPower().power(effect_size = np.sqrt(ese), nobs = 3*70,
alpha=0.05, k_groups=3)
0.7628744855163619
Best Answer
I think what's confusing you here is that the power function in the statsmodels in Python takes as an input Cohen's d for the effect size. Cohen's d scales the effect size in terms of pooled standard deviation. For your problem, you need to add the variance of the control and sample groups divided by two and then take the square root. This gives you Cohen's d.
I've modified the code from here to address you problem. You can change the input to 'two.sided' instead of 'larger' to see the required sample for the two-sided test.
Again, you'll see that the required sample size is 501771.