R – Visualizing Logistic Regression with Simulations

logisticrregressionsimulation

I'm trying to simulate a logistic regression and see the different parameters that I've modelled.

Here is a reproducible example and the output.

set.seed(98765)
n = 20
x1 = rnorm(n = n, mean = 6, sd = 1)
# Rescale the data
x1z = scale(x1)
z = 0 + 2*x1z  # this is the equation that gives the LOG odds. Note $p = odds/(1+odds)$
pr = 1/(1+exp(-z)) # Transform to get the LOG(odds) # inverse-logit function; Note that 1/(1+exp(-x))== exp(x)/(1+exp(x)), same as pr2 = boot::inv.logit(z)
# pr2 = exp(z)/(1+exp(z)) #  z being the log odds. Using the exp (exponential with natural number), you get the odds. The use the $p = odds/(1+odds)$ formula to get teh probability
y = rbinom(n = n, size = 1, prob = pr) # Bernoulli response variable (which is a special case of the binomial with size =1 )

# Combine the data in a dataframe 
df = data.frame(y = y, x1 = x1)

#now feed it to glm:
glm.logist = glm( y~x1, data=df, family="binomial")
glm.sum = summary(glm.logist)

par(mfrow=c(1,2))
b.5 = scales::alpha("black",.5)
plot(z~x1, ylab = "Log Odds", pch = 19, col = b.5, xlim = c(0,10), ylim = c(-12,11))
abline(a = glm.sum$coefficients[1,1],
       b = glm.sum$coefficients[2,1])
abline(h=0, v=0,lty = 3)
points(x = 0, y=glm.sum$coefficients[1,1], pch = 19, col = "red")
text(x = 0, y=glm.sum$coefficients[1,1], labels = c("Intercept"), pos =4)

glm.sum$coefficients

plot(y~x1, data = df, col = scales::alpha("black",.5), pch = 19)
abline(h=0.5, v=mean(x1),lty = 3)
newdata <- data.frame(x1=seq(min(x1), max(x1),len=n))
newdata$y = predict(object = glm.logist, newdata = newdata, type = "response") 
lines(x = newdata$x1,
      y = newdata$y, col = "red",lwd = 2)

1/(1+exp(-glm.sum$coefficients[1,1]))
1/(1+exp(-glm.sum$coefficients[2,1]))

enter image description here

  • So the "intercept" that was put into the z = 0 + 2*x1z is not shown in the first graph as I expect this to have not the same meaning as the intercept of the linear model. What is the role of the intercept in the regression model? The way it is coded, changing the intercept in the log odds here z = intercept + 2*x1z, changes the height of the line. If this "intercept" is big enough, all values are above 0 (log odds) and so all the response values are 1. So what is the meaning of that "intercept"?
  • Also, I know there are only 20 points in the simulation, but why is the line so bellow the data in the log odds graph?

Best Answer

  • The intercept in a logistic regression model represents the log-odds of response when all other covariates in the model are equal to 0. A log odds of 0 implies a probability of 0.5. The log odds theoretically needs to $\infty$ so that all responses are 1. Your description of the relation between log odds and probability of response is wrong here and needs to be checked.

  • The first plot is lower because you have plotted the expected log odds as the points and the actual log odds as the line. Change the seed and you'll readily see the effect of random variability in the sample.

    par(mfrow=c(1,3))
    b.5 = scales::alpha("black",.5)
    plot(z~x1z, ylab = "Log Odds", pch = 19, col = b.5, xlim = c(-5,5), ylim = c(-12,11))
    abline(a = 0,
         b = 2, col = "red")
    abline(h=0, v=0,lty = 3)
    
    plot(z~x1, ylab = "Log Odds", pch = 19, col = b.5, xlim = c(0,10), ylim = c(-12,11))
    abline(a = glm.sum$coefficients[1,1],
           b = glm.sum$coefficients[2,1])
    abline(h=0, v=0,lty = 3)
    points(x = 0, y=glm.sum$coefficients[1,1], pch = 19, col = "red")
      text(x = 0, y=glm.sum$coefficients[1,1], labels = c("Intercept"), pos =4)
    
Related Question