Logistic – Conducting Power Analysis for Factorial Logistic Regression Without Estimated Proportions

experiment-designlogisticstatistical-power

If we have a balanced factorial designed experiment where each variable is taken in 2 levels (+1,-1) and we don't have estimates of each proportion for each factor level combination like we did in this question: Simulation of logistic regression power analysis – designed experiments, what is the best approach for determining sample size? Can we consider the baseline proportion (say 0.0005) and how large a deviation we care to detect (say 0.00005) and then simply run a two sample difference in proportions power analysis? That is:

power.prop.test(n=NULL, p1=0.0005, p2=0.00055, power=0.8)

which in this case suggests the total sample size required is 6589596 (3294798 *2 from the output of the R function), and divide this sample size between each design point? So if there are 4 factors at 2 levels each then each "cell" gets (6589596 / 16)?

The thinking: Given a factorial design, each effect (main as well as interactions) will use 1/2 of the total sample for (+ levels) and the other half for (- levels) if we say we only care about effects that are at least a certain size, this approach should work regardless of the number of 2 level factors and works for both main effects and interactions.

Is this correct?

Best Answer

As I discuss in my answer to your linked question, there are different kinds of power when there are multiple hypotheses you want to test. For example, you can talk about the all effects power, the power to detect a specific effect, or the any effect power (these are in weakly descending order). If you only care about the one effect, and the other effects are nuisances, you can do what you suggest. (Technically, you should add a couple of additional data to account for the degrees of freedom that will be lost accounting for the nuisance parameters, but that seems inconsequential in your case with so much data anyway.)

On the other hand, if you care about all of these effects, and they are orthogonal (as suggested by "a balanced factorial designed experiment"), then you could do what you suggest for each effect. The all effects power would be the product of the powers for the three specified effects. For instance, let's say that at a given N, the prespecified single effect powers are .82, .80, and .67. Then the power to detect all three would be .82 * .80 * .67 = 0.44.

Related Question