Logistic Regression Power Analysis – Performing Multiple Logistic Regression Power Analysis

logisticmultiple regressionregressionstatistical-power

I have a logistic regression model and output an $R^2$ value. I then go and add another predictor variable to fit a second model. I can output a new $R^2$ value associated with the second model. When I run an ANOVA test, I see no significant improvement in the second model, but I want to assess the power associated with including the additional variable in model 2.

I have found an example for linear regression that uses an $F$-Test. I want to do something similar for a logistic regression using G*Power.

But there appears to be very little documentation on multiple logistic regression models like my situation. I don't know how to do a more detailed power analysis for multiple logistic regression.

From what I understand, in G*Power I set Test Family == z tests and statistical test == logisitic regression. But I am not sure what to set R² other X equal to. Is that the improvement in $R^2$?

Reading the tutorial in 27.4 from the software manual makes no variation of $R^2$, whereas this example, does not discuss the improvements made from $R^2$.

Best Answer

The problem is that there isn't really a $R^2$ for logistic regression. Instead there are many different "pseudo-$R^2$s" that may be similar to the $R^2$ from a linear model in different ways. You can get a list of some at UCLA's statistics help website here.

In addition, the effect (e.g., odds ratio) of the added variable, $x_2$, isn't sufficient to determine your power to detect that effect. It matters how $x_2$ is distributed: The more widely spread the values are, the more powerful your test, even if the odds ratio is held constant. It further matters what the correlation between $x_2$ and $x_1$ is: The more correlated they are, the more data would be required to achieve the same power.

As a result of these facts, the way I try to calculate the power in these more complicated situations is to simulate. In that vein, it may help you to read my answer here: Simulation of logistic regression power analysis - designed experiments.

Looking at G*Power's documentation, they use a method based on Hsieh, Bloch, & Larsen (1998). The idea is that you first regress $x_2$ on $x_1$ (or whatever predictor variables went into the first model) using a linear regression. You use the regular $R^2$ for that. (That value should lie in the interval $[0,\ 1]$.) It goes in the R² other X field you are referring to. Then you specify the distribution of $x_2$ in the next couple of fields (X distribution, X parm μ, and Z parm σ).

Hsieh, F.Y., Bloch, D.A., & Larsen, M.D. (1998). A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 17, 1623-1634.

Preliminaries:

As discussed in the G*Power manual, there are several different types of power analyses, depending on what you want to solve for. (That is, $N$, the effect size $ES$, $\alpha$, and power exist in relation to each other; specifying any three of them will let you solve for the fourth.)
- in your description, you want to know the appropriate $N$ to capture the response rates you specified with $\alpha=.05$, and power = 80%. This is a-priori power.
- we can start with post-hoc power (determine power given $N$, response rates, & alpha) as this is conceptually simpler, and then move up
In addition to @GregSnow's excellent post, another really great guide to simulation-based power analyses on CV can be found here: Calculating statistical power. To summarize the basic ideas:
1. figure out the effect you want to be able to detect
2. generate N data from that possible world
3. run the analysis you intend to conduct over those faux data
4. store whether the results are 'significant' according to your chosen alpha
5. repeat many ($B$) times & use the % 'significant' as an estimate of (post-hoc) power at that $N$
6. to determine a-priori power, search over possible $N$'s to find the value that yields your desired power
Whether you will find significance on a particular iteration can be understood as the outcome of a Bernoulli trial with probability $p$ (where $p$ is the power). The proportion found over $B$ iterations allows us to approximate the true $p$. To get a better approximation, we can increase $B$, although this will also make the simulation take longer.
In R, the primary way to generate binary data with a given probability of 'success' is ?rbinom
- E.g. to get the number of successes out of 10 Bernoulli trials with probability p, the code would be rbinom(n=10, size=1, prob=p), (you will probably want to assign the result to a variable for storage)
- you can also generate such data less elegantly by using ?runif, e.g., ifelse(runif(1)<=p, 1, 0)
- if you believe the results are mediated by a latent Gaussian variable, you could generate the latent variable as a function of your covariates with ?rnorm, and then convert them into probabilities with pnorm() and use those in your rbinom() code.
You state that you will "include a polynomial term Var1*Var1) to account for any curvature". There is a confusion here; polynomial terms can help us account for curvature, but this is an interaction term--it will not help us in this way. Nonetheless, your response rates require us to include both squared terms and interaction terms in our model. Specifically, your model will need to include: $var1^2$, $var1*var2$, and $var1^2*var2$, beyond the basic terms.
Although written in the context of a different question, my answer here: Difference between logit and probit models has a lot of basic information about these types of models.
Just as there are different kinds of Type I error rates when there are multiple hypotheses (e.g., per-contrast error rate, familywise error rate, & per-family error rate), so are there different kinds of power* (e.g., for a single pre-specified effect, for any effect, & for all effects). You could also seek for the power to detect a specific combination of effects, or for the power of a simultaneous test of the model as a whole. My guess from your description of your SAS code is that it is looking for the latter. However, from your description of your situation, I am assuming you want to detect the interaction effects at a minimum.
- *reference: Maxwell, S.E. (2004). The persistence of underpowered studies in psychological research: causes, consequences, and remedies. Psychological Methods, 9, 2, pp. 147-163.
- your effects are quite small (not to be confused with the low response rates), so we will find it difficult to achieve good power.
- Note that, although these all sound fairly similar, they are very much not the same (e.g., it is very possible to get a significant model with no significant effects--discussed here: How can a regression be significant yet all predictors be non-significant?, or significant effects but where the model is not significant--discussed here: Significance of coefficients in linear regression: significant t-test vs non-significant F-statistic), which will be illustrated below.
For a different way to think about issues related to power, see my answer here: How to report general precision in estimating correlations within a context of justifying sample size.

Simple post-hoc power for logistic regression in R:

Let's say your posited response rates represent the true situation in the world, and that you had sent out 10,000 letters. What is the power to detect those effects? (Note that I am famous for writing "comically inefficient" code, the following is intended to be easy to follow rather than optimized for efficiency; in fact, it's quite slow.)

set.seed(1)

repetitions = 1000
N = 10000
n = N/8
var1  = c(   .03,    .03,    .03,    .03,    .06,    .06,    .09,   .09)
var2  = c(     0,      0,      0,      1,      0,      1,      0,     1)
rates = c(0.0025, 0.0025, 0.0025, 0.00395, 0.003, 0.0042, 0.0035, 0.002)

var1    = rep(var1, times=n)
var2    = rep(var2, times=n)
var12   = var1**2
var1x2  = var1 *var2
var12x2 = var12*var2

significant = matrix(nrow=repetitions, ncol=7)

startT = proc.time()[3]
for(i in 1:repetitions){
  responses          = rbinom(n=N, size=1, prob=rates)
  model              = glm(responses~var1+var2+var12+var1x2+var12x2, 
                           family=binomial(link="logit"))
  significant[i,1:5] = (summary(model)$coefficients[2:6,4]<.05)
  significant[i,6]   = sum(significant[i,1:5])
  modelDev           = model$null.deviance-model$deviance
  significant[i,7]   = (1-pchisq(modelDev, 5))<.05
}
endT = proc.time()[3]
endT-startT

sum(significant[,1])/repetitions      # pre-specified effect power for var1
[1] 0.042
sum(significant[,2])/repetitions      # pre-specified effect power for var2
[1] 0.017
sum(significant[,3])/repetitions      # pre-specified effect power for var12
[1] 0.035
sum(significant[,4])/repetitions      # pre-specified effect power for var1X2
[1] 0.019
sum(significant[,5])/repetitions      # pre-specified effect power for var12X2
[1] 0.022
sum(significant[,7])/repetitions      # power for likelihood ratio test of model
[1] 0.168
sum(significant[,6]==5)/repetitions   # all effects power
[1] 0.001
sum(significant[,6]>0)/repetitions    # any effect power
[1] 0.065
sum(significant[,4]&significant[,5])/repetitions   # power for interaction terms
[1] 0.017

So we see that 10,000 letters doesn't really achieve 80% power (of any sort) to detect these response rates. (I am not sufficiently sure about what the SAS code is doing to be able to explain the stark discrepancy between these approaches, but this code is conceptually straightforward--if slow--and I have spent some time checking it, and I think these results are reasonable.)

Simulation-based a-priori power for logistic regression:

From here the idea is simply to search over possible $N$'s until we find a value that yields the desired level of the type of power you are interested in. Any search strategy that you can code up to work with this would be fine (in theory). Given the $N$'s that are going to be required to capture such small effects, it is worth thinking about how to do this more efficiently. My typical approach is simply brute force, i.e. to assess each $N$ that I might reasonably consider. (Note however, that I would typically only consider a small range, and I'm typically working with very small $N$'s--at least compared to this.)

Instead, my strategy here was to bracket possible $N$'s to get a sense of what the range of powers would be. Thus, I picked an $N$ of 500,000 and re-ran the code (initiating the same seed, n.b. this took an hour and a half to run). Here are the results:

sum(significant[,1])/repetitions      # pre-specified effect power for var1
[1] 0.115
sum(significant[,2])/repetitions      # pre-specified effect power for var2
[1] 0.091
sum(significant[,3])/repetitions      # pre-specified effect power for var12
[1] 0.059
sum(significant[,4])/repetitions      # pre-specified effect power for var1X2
[1] 0.606
sum(significant[,5])/repetitions      # pre-specified effect power for var12X2
[1] 0.913
sum(significant[,7])/repetitions      # power for likelihood ratio test of model
[1] 1
sum(significant[,6]==5)/repetitions   # all effects power
[1] 0.005
sum(significant[,6]>0)/repetitions    # any effect power
[1] 0.96
sum(significant[,4]&significant[,5])/repetitions   # power for interaction terms
[1] 0.606

We can see from this that the magnitude of your effects varies considerably, and thus your ability to detect them varies. For example, the effect of $var1^2$ is particularly difficult to detect, only being significant 6% of the time even with half a million letters. On the other hand, the model as a whole was always significantly better than the null model. The other possibilities are arrayed in between. Although most of the 'data' are thrown away on each iteration, a good bit of exploration is still possible. For example, we could use the significant matrix to assess the correlations between the probabilities of different variables being significant.

I should note in conclusion, that due to the complexity and large $N$ entailed in your situation, this was not as simple as I had suspected / claimed in my initial comment. However, you can certainly get the idea for how this can be done in general, and the issues involved in power analysis, from what I've put here. HTH.

Solved – Power of a Multiple Linear Regression

In GPower, you do a power for an R2 in multiple regression by doing the partial R2 with no predictors in the baseline model.

To do this, set the total number of predictors to 1, and the number of tested predictors to 1. You're then testing the model against an intercept only model, with an R2 of zero.

(You can always think of regression models in this way - you're testing against no predictors, and looking at the change in R2.)

Best Answer

Related Solutions

Statistical Power – Simulation of Logistic Regression Power Analysis in Designed Experiments

Preliminaries:

Simple post-hoc power for logistic regression in R:

Simulation-based a-priori power for logistic regression:

Solved – Power of a Multiple Linear Regression

Related Question