Ok, I have a logistic regression and have used the predict()
function to develop a probability curve based on my estimates.
## LOGIT MODEL:
library(car)
mod1 = glm(factor(won) ~ as.numeric(bid), data=mydat, family=binomial(link="logit"))
## PROBABILITY CURVE:
all.x <- expand.grid(won=unique(won), bid=unique(bid))
y.hat.new <- predict(mod1, newdata=all.x, type="response")
plot(bid<-000:1000,predict(mod1,newdata=data.frame(bid<-c(000:1000)),type="response"), lwd=5, col="blue", type="l")
This is great but I'm curious about plotting the confidence intervals for the probabilities. I've tried plot.ci()
but had no luck. Can anyone point me to some ways to get this done, preferably with the car
package or base R.
Best Answer
The code you used estimates a logistic regression model using the
glm
function. You didn't include data, so I'll just make some up.A logistic regression model models the relationship between a binary response variable and, in this case, one continuous predictor. The result is a logit-transformed probability as a linear relation to the predictor. In your case, the outcome is a binary response corresponding to winning or not winning at gambling and it is being predicted by the value of the wager. The coefficients from
mod1
are given in logged odds (which are difficult to interpret), according to:$$\text{logit}(p)=\log\left(\frac{p}{(1-p)}\right)=\beta_{0}+\beta_{1}x_{1}$$
To convert logged odds to probabilities, we can translate the above to
$$p=\frac{\exp(\beta_{0}+\beta_{1}x_{1})}{(1+\exp(\beta_{0}+\beta_{1}x_{1}))}$$
You can use this information to set up the plot. First, you need a range of the predictor variable:
Then using
predict
, you can obtain predictions based on your modelNote that the fitted values can also be obtained via
By specifying
se.fit=TRUE
, you also get the standard error associated with each fitted value. The resultingdata.frame
is a matrix with the following components: the fitted predictions (fit
), the estimated standard errors (se.fit
), and a scalar giving the square root of the dispersion used to compute the standard errors (residual.scale
). In the case of a binomial logit, the value will be 1 (which you can see by enteringpreddat$residual.scale
inR
). If you want to see an example of what you've calculated so far, you can typehead(data.frame(preddat))
.The next step is to set up the plot. I like to set up a blank plotting area with the parameters first:
Now you can see where it is important to know how to calculate the fitted probabilities. You can draw the line corresponding to the fitted probabilities following the second formula above. Using the
preddat data.frame
you can convert the fitted values to probabilities and use that to plot a line against the values of your predictor variable.Finally, answer your question, the confidence intervals can be added to the plot by calculating the probability for the fitted values
+/- 1.96
times the standard error:The resulting plot (from the randomly generated data) should look something like this:
For expediency's sake, here's all the code in one chunk:
(Note: This is a heavily edited answer in an attempt to make it more relevant to stats.stackexchange.)