Within the context of a research proposal in the social sciences, I was asked the following question:
I have always gone by 100 + m (where m
is the number of predictors) when
determining minimum sample size for
multiple regression. Is this
appropriate?
I get similar questions a lot, often with different rules of thumb.
I've also read such rules of thumb quite a lot in various textbooks.
I sometimes wonder whether popularity of a rule in terms of citations is based on how low the standard is set.
However, I'm also aware of the value of good heuristics in simplifying decision making.
Questions:
- What is the utility of simple rules of thumb for minimum sample sizes within the context of applied researchers designing research studies?
- Would you suggest an alternative rule of thumb for minimum sample size for multiple regression?
- Alternatively, what alternative strategies would you suggest for determining minimum sample size for multiple regression? In particular, it would be good if value is assigned to the degree to which any strategy can readily be applied by a non-statistician.
Best Answer
I'm not a fan of simple formulas for generating minimum sample sizes. At the very least, any formula should consider effect size and the questions of interest. And the difference between either side of a cut-off is minimal.
Sample size as optimisation problem
A Rough Rule of Thumb
In terms of very rough rules of thumb within the typical context of observational psychological studies involving things like ability tests, attitude scales, personality measures, and so forth, I sometimes think of:
These rules of thumb are grounded in the 95% confidence intervals associated with correlations at these respective levels and the degree of precision that I'd like to theoretically understand the relations of interest. However, it is only a heuristic.
G Power 3
Multiple Regression tests multiple hypotheses
Power analysis for multiple regression is made more complicated by the fact that there are multiple effects including the overall r-squared and one for each individual coefficient. Furthermore, most studies include more than one multiple regression. For me, this is further reason to rely more on general heuristics, and thinking about the minimal effect size that you want to detect.
In relation to multiple regression, I'll often think more in terms of the degree of precision in estimating the underlying correlation matrix.
Accuracy in Parameter Estimation
I also like Ken Kelley and colleagues' discussion of Accuracy in Parameter Estimation.
MBESS
package in R to perform analyses relating sample size to precision in parameter estimation.