Solved – Power analysis for moderator effect in regression with two continuous predictors

interactionregressionsample-sizestatistical-power

Related to an earlier question on power analysis for multiple regression, a social science researcher asked me about power analysis for moderator regression (i.e., an interaction effect).
The researcher asked me:

I seem to recall that power of tests
for moderation with two continuous
predictor variables is low – do you know the
minimum sample size requirement in
this context?

From the context, it can further be assumed that this is an observational study (not an experimental study) and that the dependent variable is continuous.

Question

What advice would you give regarding calculating the minimum sample size required?
Are there any caveats that you would present?

Best Answer

If I had to do this, I would use a simulation approach. This would involve making assumptions about the regression coefficients, predictor distributions, correlation between predictors, and error variance (with help from the researcher), generating data sets using the assumed model, and seeing what proportion of these give a significant p-value for the interaction. Then use trial and error to find the minimum sample size giving the required power.

Related Solutions

Solved – Adjusting any power analysis with FPC

Update: 2014-02-06: changed text to be more emphatic that fpc should not be used in a causal analysis **Update: 2014-02-04: impact of the randomized experimental design

This question has raised some fundamental issues.

You stated in your update that a researcher can control the make-up of the experimental groups. Not so. Even if one randomized an entire population, there would be imbalance, perhaps trivial, in every variable. Even with some kind of balancing algorithm, which would destroy the randomization, one can never arrange for identity of the means of the outcome variable, yet unmeasured.

You also asked Tom Lumley:

Are you saying it is legitimate to estimate the confidence interval of say, the difference between the proportion of men and women answering 'Yes' but not a p-value to determine if it is zero (i.e. to reject the null)?

I think that's what Tom meant, and I agree with its application to descriptive statistics; ~~I'm not sure that it applies~~ It does not apply to causal analyses, including those generated by an experiment. Your particular example is a borderline case, as you intend the results to apply to a single population at a particular time. If someone asked you to project your findings to another setting or to another time period, the confidence interval calculation ~~probably~~ should not include the fpc.

Some additional insight can be gained by considering the experimental design as part of the sample design. If the initial random sample is of size $n$, randomization produces two random sub-samples of size $n_1 = n/2$ and $n_2 = n/2$. (For the theory that follows, $n_1$ and $n_2$ need not be equal.) Let $\overline{y}_1$ and $\overline{y}_2$ be the means of the sub-samples; proportions are special cases. In this scenario, which conforms to the absence of a treatment effect, it can be shown (Cochran, 1977, problem 2.16, p. 48) that:

\begin{equation} Var(\overline{y}_1 -\overline{y}_2) = S^2\left(\frac{1}{n_1} +\frac{1}{n_2}\right) \end{equation}

where $S^2$ is the population variance and variation is over repetitions of the sampling and randomization. Notice: no fpc.

Update: one of the few established uses of hypothesis tests + FPCs for finite populations: lot quality assurance sampling (LQAS)

I agree with Tom's answer. Hypothesis testing rarely has a place in finite population questions, but confidence intervals certainly do. One good use of hypothesis tests per se in finite populations is lot quality assurance sampling (LQAS), which tests whether the rate of some event (e.g. vaccination) in a geographic area is too high or too low. Note that, unlike the question at hand, there is no hypothesis of zero difference. The null hypothesis is that the rate is < K, and the alternative that is it is $\geq$K. See, at Google Scholar.

Robertson, Susan E, Martha Anker, Alain J Roisin, Nejma Macklai, Kristina Engstrom, and F Marc LaForce. 1997. The Lot quality technique: a global review of applications in the assessment of health services and disease surveillance. Relation 50, no. 3/4: 199-209.

Lemeshow, Stanley, and Scott Taber. 1991. Lot quality assurance sampling: single-and double-sampling plans. World Health Stat Q 44, no. 3: 115-132.

Original Answer

Using the fpc to reduce sample size makes no sense unless intend you use it in the the hypothesis-testing statistic. But that would be an error: the fpc should not be used when testing hypotheses [added about "no difference"].

The reasoning is interesting (Cochran, 1977, p.39): It is seldom of scientific interest to ask if a null hypothesis (e.g. that two proportions are equal) is exactly true in a finite population . Except by a very rare chance, the null hypothesis will never be true, as one would discover by enumerating the entire population. Therefore hypothesis tests on samples from finite populations are done from a "super-population" viewpoint. See also Deming (1966) pp 247-261 "Distinction between enumerative and analystic studies"; Korn and Graubard (1999), p. 227.

References

Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: Wiley.

Deming, W. E. (1966). Some theory of sampling. New York: Dover Publications.

Korn, E. L., & Graubard, B. I. (1999). Analysis of health surveys (Wiley series in probability and statistics). New York: Wiley.

Solved – Why is power analysis with logistic regression so liberal compared to chi squared

The two tests (logistic regression and chi-square) are equivalent and a power analysis should give the same answer.

You are assuming that a value of 0.15 for f2 and w are the same effect size, they're not. A small value of w is 0.1, a small value of f2 is 0.02.

cohen.ES(test=c("chisq"), size=c("small"))
cohen.ES(test=c("f2"),    size=c("small"))

Edit: Elaborated on the similarity of the two approaches.

IF you give the same data to logistic regression and a chi-square test (strictly: without Yates' correction), you get the same result. Here's an example

> set.seed(1234)
> x <- rbinom(100, 1, 0.2) 
> y <- rbinom(100, 1, 0.2) 
> chisq.test(table(x, y), correct=FALSE)

    Pearson's Chi-squared test #'

data:  table(x, y)
X-squared = 0.155, df = 1, p-value = **0.694**

Warning message:
In chisq.test(table(x, y), correct = FALSE) :
  Chi-squared approximation may be incorrect
> summary(glm(y ~ x, family="binomial"))

Call:
glm(formula = y ~ x, family = "binomial")

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-0.753  -0.753  -0.753  -0.668   1.794  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -1.114      0.251   -4.43  9.4e-06 ***
x             -0.272      0.693   -0.39     **0.69**    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 110.22  on 99  degrees of freedom
Residual deviance: 110.06  on 98  degrees of freedom
AIC: 114.1

Number of Fisher Scoring iterations: 4

The p-values are the same, so the power should be the same. I can't remember the formulas for the two different versions of the effect size. Effect size measures are a little weird because in the old days you wanted to minimize the number of tables that you put into books (so we have, for example, $f^2$ instead of $R^2$, when there's a direct relationship between them, and $R^2$ is what everyone understands).

Question

Best Answer

Related Solutions

Solved – Adjusting any power analysis with FPC

Solved – Why is power analysis with logistic regression so liberal compared to chi squared

Related Question