Logistic Regression – How to Regress When Logit is Infinity for y=1

logisticregression

Logistic regression essentially means taking the logit of your response proportions, p, and then doing standard regression. Consider the case where one p is 1.

logit(1) = log(1/(1-1)) = infinity. How can you do regression with infinity?

That is, if one of your observed proportions p is 1, then you are trying to find the line that minimises the sum of squared differences from a set of points that include infinity. How does this not drive the regression line to always have infinite slope?

Best Answer

The question characterizes logistic regression as

$$\text{logit}(y) = \beta_0 + \beta_1 x + \varepsilon$$

and proposes to fit this model using least squares. It points out that because $y$ is a binary ($0$-$1$) variable, $\text{logit}(y)$ is undefined (or should be considered infinite), which is--to say the least--problematic!

The resolution of this conundrum is to avoid taking the logit of $y$ but instead apply its inverse, the logistic function

$$f(x) = \frac{1}{1 + \exp(-x)},$$

to the right hand side. Because $y$ on the left hand side still is a random variable with possible outcomes $0$ and $1$, it must be a Bernoulli variable: that is, what we need to know about it is the chance that $y=1$, written $\Pr(y=1).$ Therefore we make another attempt in the form

$$\Pr(y=1) = f(\beta_0 + \beta_1 x).$$

This is an example of a generalized linear model. Its parameters $\beta_0$ and $\beta_1$ are typically (but not necessarily) found using Maximum Likelihood.

To understand this better, many people find it instructive to create synthetic datasets according to this model (instead of analyzing actual data, where the true model is unknown). We will look at how that might be coded in R, which is well suited to expressing and simulating statistical models. First, though, let's inspect its results.

Figure

The data are shown as jittered points (they have been randomly shifted slightly in the horizontal direction to resolve overlaps). The true underlying probability function is plotted in solid red. The probability function fit using Maximum Likliehood is plotted in dashed gray.

You can see that where the red curve is high--which means the chance of $y=1$ is high--most of the data are $1$'s, whereas where the red curve drops to low levels, most of the data are $0$'s. The height of the curve stipulates the chance that the response will be a $1$. In logistic regression, the curve usually has the sigmoidal shape of the logistic function, while the data are always either at $y=1$ or $y=0$.

Reading over the code, which is written for expressive clarity, will help make these descriptions precise.

#
# Synthesize some data.
#
set.seed(17)                        # Allows results to be reproduced exactly
n <- 8                              # Number of distinct x values
k <- 4                              # Number of independent obs's for each x
x <- rep(1:n, 4)                    # Independent values
beta <- c(3, -1)                    # True parameters
logistic <- function(x) 1 / (1 + exp(-x))
probability <- function(x, b) logistic(b[1] + b[2]*x)
y <- rbinom(n*k, size=1, prob=probability(x, beta))   # Simulated data
#
# Fit the data using a logistic regression.
#
summary(fit <- glm(y ~ x, family=binomial(link="logit")))
#
# Plot the data, the true underlying probability function, and the fitted one.
#
jitter <- runif(n*k, -1/3, 1/3)     # Displaces points to resolve overlaps
plot(x+jitter, y, type="p", xlab="x", ylab="y", main="Data with true and fitted models")
curve(probability(x, beta), col="Red", lwd=2, add=TRUE)
curve(probability(x, coef(fit)), col="Gray", lwd=2, lty=2, add=TRUE)
Related Question