Logistic Regression Power Analysis – Performing Multiple Logistic Regression Power Analysis

logisticmultiple regressionregressionstatistical-power

I have a logistic regression model and output an $R^2$ value. I then go and add another predictor variable to fit a second model. I can output a new $R^2$ value associated with the second model. When I run an ANOVA test, I see no significant improvement in the second model, but I want to assess the power associated with including the additional variable in model 2.

I have found an example for linear regression that uses an $F$-Test. I want to do something similar for a logistic regression using G*Power.

But there appears to be very little documentation on multiple logistic regression models like my situation. I don't know how to do a more detailed power analysis for multiple logistic regression.

From what I understand, in G*Power I set Test Family == z tests and statistical test == logisitic regression. But I am not sure what to set R² other X equal to. Is that the improvement in $R^2$?

Reading the tutorial in 27.4 from the software manual makes no variation of $R^2$, whereas this example, does not discuss the improvements made from $R^2$.

Best Answer

The problem is that there isn't really a $R^2$ for logistic regression. Instead there are many different "pseudo-$R^2$s" that may be similar to the $R^2$ from a linear model in different ways. You can get a list of some at UCLA's statistics help website here.

In addition, the effect (e.g., odds ratio) of the added variable, $x_2$, isn't sufficient to determine your power to detect that effect. It matters how $x_2$ is distributed: The more widely spread the values are, the more powerful your test, even if the odds ratio is held constant. It further matters what the correlation between $x_2$ and $x_1$ is: The more correlated they are, the more data would be required to achieve the same power.

As a result of these facts, the way I try to calculate the power in these more complicated situations is to simulate. In that vein, it may help you to read my answer here: Simulation of logistic regression power analysis - designed experiments.

Looking at G*Power's documentation, they use a method based on Hsieh, Bloch, & Larsen (1998). The idea is that you first regress $x_2$ on $x_1$ (or whatever predictor variables went into the first model) using a linear regression. You use the regular $R^2$ for that. (That value should lie in the interval $[0,\ 1]$.) It goes in the R² other X field you are referring to. Then you specify the distribution of $x_2$ in the next couple of fields (X distribution, X parm μ, and Z parm σ).