I have a set of models that are the result of a multiple linear regression. I would like to calculate the power for each of these models. I found this tutorial on calculating the power using G*Power. Since the data has already been collected, I am using the Post Hoc type of Power Analysis, and I was hoping to use the "Determine" button to automatically calculate the effect size. However, I see that one need to provide a partial R2, but I would think that for a multiple linear regression (which I think would have multiple partial R2 values), it would make more sense to provide the R2 for the resulting model. Am I misunderstanding the difference between R2 and partial R2? If so, how do I obtain the partial R2 to use in G*Power?
Solved – Power of a Multiple Linear Regression
multiple regressionregressionstatistical-power
Related Solutions
The problem is that there isn't really a $R^2$ for logistic regression. Instead there are many different "pseudo-$R^2$s" that may be similar to the $R^2$ from a linear model in different ways. You can get a list of some at UCLA's statistics help website here.
In addition, the effect (e.g., odds ratio) of the added variable, $x_2$, isn't sufficient to determine your power to detect that effect. It matters how $x_2$ is distributed: The more widely spread the values are, the more powerful your test, even if the odds ratio is held constant. It further matters what the correlation between $x_2$ and $x_1$ is: The more correlated they are, the more data would be required to achieve the same power.
As a result of these facts, the way I try to calculate the power in these more complicated situations is to simulate. In that vein, it may help you to read my answer here: Simulation of logistic regression power analysis - designed experiments.
Looking at G*Power's documentation, they use a method based on Hsieh, Bloch, & Larsen (1998). The idea is that you first regress $x_2$ on $x_1$ (or whatever predictor variables went into the first model) using a linear regression. You use the regular $R^2$ for that. (That value should lie in the interval $[0,\ 1]$.) It goes in the R² other X
field you are referring to. Then you specify the distribution of $x_2$ in the next couple of fields (X distribution
, X parm μ
, and Z parm σ
).
- Hsieh, F.Y., Bloch, D.A., & Larsen, M.D. (1998). A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 17, 1623-1634.
Seeing a statistically significant increase in R2 between two models is equivalent to seeing a statistically significant set of predictors in a model that cause that increase. Yet, at its core, the R2 is about prediction. It doesn't sound like you're doing prediction. It's important to be consistent about your goals in an analysis: if you were after prediction, I might criticize your model for lacking certain important predictors, or I might question whether power is something you should be calculating since power concerns inference.
It sounds like your model is about inference: you have some key variables, and a design to assess their effect on outcome(s). When we fit models for inference, we look at the effects--the coefficient terms--to see if the 1-$\alpha$% CIs include 0 or not, and that is the same as measuring if the p-value is less than $\alpha$.
This type of test is essentially a one sample t-test: Is the 95% CI for $\beta$, the regression coefficient, different from 0? For grants, I have used G*Power, and other software, to calculate power for multiple linear regression using the t-test power calculator. You need only specify the mean and the standard deviation of the sample to calculate that power. Those values are identified from previous literature: you need only calculate the half-width of the CI and divide it by the critical value to find the standard error of the mean (for symmetric CIs). If no literature exists, you must make a guess and rationalize it; but again drawing on the literature strengthens your guess.
Lastly, you mention 1 binary regressor which you call an independent variable and 3 outcome variables. I'm assuming further this is a multiple linear regression because you are further adjusting for, say, age, sex, household structure (parental income, education, household size), among a number of other important traits identified in the literature which may be confounding a possible association. If that's the case, further discussion is needed about multiple testing: there are 3 possible hypotheses you are testing. Justify why you should or should not adjust the threshold level for statistical significance to account for an inflated number of false positive findings. This is the last relevant piece to complete a comprehensive power analysis.
Best Answer
In GPower, you do a power for an R2 in multiple regression by doing the partial R2 with no predictors in the baseline model.
To do this, set the total number of predictors to 1, and the number of tested predictors to 1. You're then testing the model against an intercept only model, with an R2 of zero.
(You can always think of regression models in this way - you're testing against no predictors, and looking at the change in R2.)