Solved – Sample size for multiple linear regression

regressionsampling

I have two questions :

I am wondering if it is possible to know the size of the sample (n) for a multiple linear regression if I want the tests to be powerful enough ?
To me, it seems very difficult in reality to use the calculation of power for the multiple linear model, because it would be necessary to enter valid values for the different parameters.
Is it possible to do that with R ? If yes, How can I do this with R ? The R code ?

With R, there is this code for the multiple linear regression :

pwr.f2.test(u = NULL, v = NULL, f2 = NULL, sig.level = 0.05, power = NULL)

with :
u : degrees of freedom for numerator
v : degrees of freedom for denominator
f2 : effect size
sig.level : Significance level (Type I error probability)
power : Power of test (1 minus Type II error probability)

  1. Is it possible to improve this R code to take account of (consider) the size of the sample (n) and the different parameters of the regression ?
  2. What about the size of the sample (n) for a logistic regression (logit regression)? Is it possible to know the size of the sampling knowing some criterion like (sig.level=0.05, power=0.8…)?

Best Answer

Power analysis for multiple regression is quite complex as there are many moving parts and potentially several different tests of interest. The function pwr.f2.test is based on Cohen's book Statistical Power Analysis for the Behavioral Sciences and you can find detailed explanations and many examples there.

The most important insight is that the sample size is already captured by the coefficient v (degrees of freedom for the denominator). Exactly how depends on the details of the model. Consequently, the analysis already takes it into account.

Alternatively, another way to conduct a power analysis is to use simulation. It is particularly attractive for this sort of settings as you can play with each individual aspect of the design. For more on this, see Calculating statistical power (see also G. Jay Kern's post).

Once you get the hang of it, it's also quite easy to extend the simulation approach to numerous other tests. For logistic regression, I am not sure if there is any specific function in the pwr package but there is one in G*Power. I don't remember ever using it so I can't comment further on this part of the question.

Related Question