The code you used estimates a logistic regression model using the glm
function. You didn't include data, so I'll just make some up.
set.seed(1234)
mydat <- data.frame(
won=as.factor(sample(c(0, 1), 250, replace=TRUE)),
bid=runif(250, min=0, max=1000)
)
mod1 <- glm(won~bid, data=mydat, family=binomial(link="logit"))
A logistic regression model models the relationship between a binary response variable and, in this case, one continuous predictor. The result is a logit-transformed probability as a linear relation to the predictor. In your case, the outcome is a binary response corresponding to winning or not winning at gambling and it is being predicted by the value of the wager. The coefficients from mod1
are given in logged odds (which are difficult to interpret), according to:
$$\text{logit}(p)=\log\left(\frac{p}{(1-p)}\right)=\beta_{0}+\beta_{1}x_{1}$$
To convert logged odds to probabilities, we can translate the above to
$$p=\frac{\exp(\beta_{0}+\beta_{1}x_{1})}{(1+\exp(\beta_{0}+\beta_{1}x_{1}))}$$
You can use this information to set up the plot. First, you need a range of the predictor variable:
plotdat <- data.frame(bid=(0:1000))
Then using predict
, you can obtain predictions based on your model
preddat <- predict(mod1, newdata=plotdat, se.fit=TRUE)
Note that the fitted values can also be obtained via
mod1$fitted
By specifying se.fit=TRUE
, you also get the standard error associated with each fitted value. The resulting data.frame
is a matrix with the following components: the fitted predictions (fit
), the estimated standard errors (se.fit
), and a scalar giving the square root of the dispersion used to compute the standard errors (residual.scale
). In the case of a binomial logit, the value will be 1 (which you can see by entering preddat$residual.scale
in R
). If you want to see an example of what you've calculated so far, you can type head(data.frame(preddat))
.
The next step is to set up the plot. I like to set up a blank plotting area with the parameters first:
with(mydat, plot(bid, won, type="n",
ylim=c(0, 1), ylab="Probability of winning", xlab="Bid"))
Now you can see where it is important to know how to calculate the fitted probabilities. You can draw the line corresponding to the fitted probabilities following the second formula above. Using the preddat data.frame
you can convert the fitted values to probabilities and use that to plot a line against the values of your predictor variable.
with(preddat, lines(0:1000, exp(fit)/(1+exp(fit)), col="blue"))
Finally, answer your question, the confidence intervals can be added to the plot by calculating the probability for the fitted values +/- 1.96
times the standard error:
with(preddat, lines(0:1000, exp(fit+1.96*se.fit)/(1+exp(fit+1.96*se.fit)), lty=2))
with(preddat, lines(0:1000, exp(fit-1.96*se.fit)/(1+exp(fit-1.96*se.fit)), lty=2))
The resulting plot (from the randomly generated data) should look something like this:
For expediency's sake, here's all the code in one chunk:
set.seed(1234)
mydat <- data.frame(
won=as.factor(sample(c(0, 1), 250, replace=TRUE)),
bid=runif(250, min=0, max=1000)
)
mod1 <- glm(won~bid, data=mydat, family=binomial(link="logit"))
plotdat <- data.frame(bid=(0:1000))
preddat <- predict(mod1, newdata=plotdat, se.fit=TRUE)
with(mydat, plot(bid, won, type="n",
ylim=c(0, 1), ylab="Probability of winning", xlab="Bid"))
with(preddat, lines(0:1000, exp(fit)/(1+exp(fit)), col="blue"))
with(preddat, lines(0:1000, exp(fit+1.96*se.fit)/(1+exp(fit+1.96*se.fit)), lty=2))
with(preddat, lines(0:1000, exp(fit-1.96*se.fit)/(1+exp(fit-1.96*se.fit)), lty=2))
(Note: This is a heavily edited answer in an attempt to make it more relevant to stats.stackexchange.)
Here's a parallel example:
> warp = transform(warpbreaks, A=wool, B=tension)
> warp.lm = lm(log(breaks) ~ A*B, data = warp)
> emms = emmeans(warp.lm, ~ A | B)
Look at the estimates:
> emms
B = L:
A emmean SE df lower.CL upper.CL
A 3.717945 0.1246647 48 3.467290 3.968601
B 3.282378 0.1246647 48 3.031723 3.533034
B = M:
A emmean SE df lower.CL upper.CL
A 3.116750 0.1246647 48 2.866094 3.367405
B 3.309327 0.1246647 48 3.058671 3.559982
B = H:
A emmean SE df lower.CL upper.CL
A 3.117623 0.1246647 48 2.866967 3.368278
B 2.904152 0.1246647 48 2.653496 3.154807
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Look at these results, back-transformed:
> summary(emms, type = "response")
B = L:
A response SE df lower.CL upper.CL
A 41.17969 5.133656 48 32.04977 52.91043
B 26.63906 3.320951 48 20.73293 34.22765
B = M:
A response SE df lower.CL upper.CL
A 22.57289 2.814043 48 17.56827 29.00316
B 27.36669 3.411661 48 21.29924 35.16256
B = H:
A response SE df lower.CL upper.CL
A 22.59260 2.816501 48 17.58361 29.02849
B 18.24975 2.275101 48 14.20361 23.44851
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Obtain confidence intervals for comparisons within each by
group, back-transformed:
> confint(pairs(emms), type = "response")
B = L:
contrast ratio SE df lower.CL upper.CL
A / B 1.5458390 0.2725354 48 1.0844650 2.203500
B = M:
contrast ratio SE df lower.CL upper.CL
A / B 0.8248307 0.1454198 48 0.5786502 1.175746
B = H:
contrast ratio SE df lower.CL upper.CL
A / B 1.2379675 0.2182568 48 0.8684814 1.764648
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Note that with a log transformation or link, back-transformed differences become ratios. Similarly, with a logit link, the comparisons will back-transform to odds ratios. For other transformations or links, there is not a sensible way to back-transform a comparison or contrast.
Finally, compare the A
differences among levels of B
, again back-transformed to ratios:
> confint(contrast(emms, interaction = "pairwise", by = NULL),
+ type = "response")
A_pairwise B_pairwise ratio SE df lower.CL upper.CL
A / B L / M 1.8741288 0.4672755 48 1.1352279 3.093968
A / B L / H 1.2486911 0.3113355 48 0.7563776 2.061443
A / B M / H 0.6662782 0.1661228 48 0.4035889 1.099948
Confidence level used: 0.95
Intervals are back-transformed from the log scale
The first entry, for example, estimates the A/B
ratio with B = L
, divided by the A/B
ratio with B = M
. (It is unfortunate that A
has levels A
and B
)
Best Answer
Yes it's possible. Wald confidence intervals in a logistic regression are calculated on the log-odds scale and then exponentiated to get the confidence interval for the corresponding odds ratio. The $p$-value is based on the test statistic $z = \frac{\hat{\beta}}{\operatorname{SE}_{\hat{\beta}}}$, which is the estimate on the log-odds scale divided by its standard error. The (two-sided) $p$-value is then calculated based on the test statistic $z$ as $p=2\Phi(-|z|)$, where $\Phi$ denotes the cdf of the standard normal distribution. In order to calculate the confidence interval, we need the standard error. Solving for the standard error, we have: $$ \operatorname{SE} = -\frac{|\hat{\beta}|}{\Phi^{-1}(p/2)} $$ where $\Phi^{-1}$ is the quantile function of the standard normal distribution. To calculate the confidence interval on the log-odds scale, we then use: $$ \hat{\beta} \pm z_{1-\alpha/2}\times \operatorname{SE}_{\hat{\beta}} $$ where $z_{1-\alpha/2}$ is a quantile of the standard distribution, e.g. $1.96$ for $\alpha = 0.05$ for a 95% confidence interval on the log-odds scale. Exponentiate the limits to get the confidence interval for the odds ratio.
Let's apply it to the estimate for variable D. Applying the formula, we recover the approximate standard error as $$ \operatorname{SE} = -\frac{|-0.040|}{-1.669593} = 0.024 $$ Hence, an approximate 95% confidence interval for the odds ratio is: $$ \operatorname{exp}(-0.040 \pm 1.96\times 0.024) $$ which evaluates to $(0.917, 1.010)$.
In
R
, you could do: