Logistic Regression – Logistic Regression and Inflection Point

binary datageneralized linear modellogisticregression

We have data with a binary outcome and some covariates. I used logistic regression to model the data. Just a simple analysis, nothing extraordinary. The final output is supposed to be a dose-response curve where we show how the probability changes for a specific covariate. Something like this:

enter image description here

We received some criticism from an internal reviewer (not a pure statistician) for choosing logistic regression. Logistic regression assumes (or defines) that the inflection point of the S-shaped curve on the probability scale is at probability 0.5. He argued that there would be no reason to assume that the inflection point was indeed at probability 0.5 and we should choose a different regression model that allows the inflection point to vary such that the actual position is data driven.

At first I was caught off guard by his argument, since I have never thought about this point. I did not have any arguments to why it would be justified to assume that the inflection point is at 0.5. After doing some research, I still don't have an answer to this question.

I came across 5-parameter logistic regression, for which the inflection point is an additional parameter, but it seems that this regression model is usually used when producing dose-response curves with a continuous outcome. I'm not sure if and how it can be extended to binary response variables.

I guess my main question is why or when it is OK to assume that the inflection point for a logistic regression is at 0.5? Does it even matter? I have never seen anybody fitting a logistic regression model and explicitly discussing the matter of the inflection point. Are there alternatives for creating a dose response curve where the inflection point is not necessarily at 0.5?

Just for completeness, the R code for generating the above picture:

dat <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
dat$rank <- factor(dat$rank)
logit <- glm(admit ~ gre + gpa + rank, family = binomial(link = "logit"), data = dat)
newdata <- data.frame(gre = seq(-2000,8000,1), gpa = 2.5, rank = factor(1,c(1,2,3,4)))
pp <- predict(logit, newdata, type = "response", se.fit = TRUE)
plot(newdata$gre, pp$fit, type="l", col="black", lwd=2,ylab="Probability", xlab="Dose")

Edit 1:

Just to add to what Scortchi said in one of the comments: The reviewer did indeed argue that biologically it might be more likely that the change in curvature occurs earlier than 0.5. Therefore his resistance against assuming that the inflection point is at 0.5.

Edit 2:

As a reaction to the comment by Frank Harrell:

As example, I modified my model above to include a quadratic and a cubic term in gre (which is the "dose" in this example).

logit <- glm(admit ~ gre+I(gre^2)+I(gre^3)+  gpa + rank, family = binomial(link = "logit"), data = dat)
newdata <- data.frame(admit=1, gre = seq(-2000,8000,1), gpa = 2.5, rank = factor(1,c(1,2,3,4)))
pp <- predict(logit, newdata, type = "response", se.fit = TRUE)
plot(newdata$gre, pp$fit, type="l", col="black", lwd=2,xlim=c(-2000,4000),ylab="Probability", xlab="Dose")

enter image description here

Despite the fact that it is probably not meaningful to add a quadratic and a cubic gre term in this case, we see that the form of the dose-response curve has changed. Indeed we now have two inflection points at about 0.25 and near 0.7.

Best Answer

As touched upon by @scortchi the reviewer was operating under the false impression that it is not possible to model nonlinear effects of predictors on the logit scale in the context of logistic regression. The original model was quick to assume linearity of all predictors. By relaxing the linearity assumption, using for example restricted cubic splines (natural splines), the entire shape of the curve is flexible and inflection point is no longer an issue. Had there been a single predictor and had it been expanded using a regression spline, one could say that the logistic model makes only the assumptions of smoothness and independence of observations.

Related Question