Calculating Confidence Intervals for Logistic Regression

logisticlogitregressionstandard error

I'm using a binomial logistic regression to identify if exposure to has_x or has_y impacts the likelihood that a user will click on something. My model is the following:

fit = glm(formula = has_clicked ~ has_x + has_y, 
          data=df, 
          family = binomial())

This the output from my model:

Call:
glm(formula = has_clicked ~ has_x + has_y, 
    family = binomial(), data = active_domains)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.9869  -0.9719  -0.9500   1.3979   1.4233  

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -0.504737   0.008847 -57.050  < 2e-16 ***
has_xTRUE -0.056986   0.010201  -5.586 2.32e-08 ***
has_yTRUE  0.038579   0.010202   3.781 0.000156 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 217119  on 164182  degrees of freedom
Residual deviance: 217074  on 164180  degrees of freedom
AIC: 217080

Number of Fisher Scoring iterations: 4

As each coefficient is significant, using this model I'm able to tell what the value of any of these combinations is using the following approach:

predict(fit, data.frame(has_x = T, has_y=T), type = "response")

I don't understand how I can report on the Std. Error of the prediction.

  1. Do I just need to use $1.96*SE$? Or do I need to convert the
    $SE$ using an approach described here?

  2. If I want to understand the standard-error for both variables
    how would I consider that?

Unlike this question, I am interested in understanding what the upper and lower bounds of the error are in a percentage. For example, of my prediction shows a value of 37% for True,True can I calculate that this is $+/- 0.3%$ for a $95\% CI$? (0.3% chosen to illustrate my point)

Best Answer

Your question may come from the fact that you are dealing with Odds Ratios and Probabilities which is confusing at first. Since the logistic model is a non linear transformation of $\beta^Tx$ computing the confidence intervals is not as straightforward.

Background

Recall that for the Logistic regression model

  • Probability of $(Y = 1)$: $p = \frac{e^{\alpha + \beta_1x_1 + \beta_2 x_2}}{1 + e^{ \alpha + \beta_1x_1 + \beta_2 x_2}}$

  • Odds of $(Y = 1)$: $ \left( \frac{p}{1-p}\right) = e^{\alpha + \beta_1x_1 + \beta_2 x_2}$

  • Log Odds of $(Y = 1)$: $ \log \left( \frac{p}{1-p}\right) = \alpha + \beta_1x_1 + \beta_2 x_2$

Consider the case where you have a one unit increase in variable $x_1$, i.e. $x_1 + 1$, then the new odds are

$$ \text{Odds}(Y = 1) = e^{\alpha + \beta_1(x_1 + 1) + \beta_2x_2} = e^{\alpha + \beta_1 x_1 + \beta_1 + \beta_2x_2 } $$

  • Odds Ratio (OR) are therefore

$$ \frac{\text{Odds}(x_1 + 1)}{\text{Odds}(x_1)} = \frac{e^{\alpha + \beta_1(x_1 + 1) + \beta_2x_2} }{e^{\alpha + \beta_1 x_1 + \beta_2x_2}} = e^{\beta_1} $$

  • Log Odds Ratio = $\beta_1$

  • Relative risk or (probability ratio) = $\frac{ \frac{e^{\alpha + \beta_1x_1 + \beta_1 + \beta_2 x_2}}{1 + e^{ \alpha + \beta_1x_1 + \beta_1 + \beta_2 x_2}}}{ \frac{e^{\alpha + \beta_1x_1 + \beta_2 x_2}}{1 + e^{ \alpha + \beta_1x_1 + \beta_2 x_2}}}$

Interpreting coefficients

How would you interpret the coefficient value $\beta_j$ ? Assuming that everything else remains fixed:

  • For every unit increase in $x_j$ the log-odds ratio increases by $\beta_j$.
  • For every unit increase in $x_j$ the odds ratio increases by $e^{\beta_j}$.
  • For every increase of $x_j$ from $k$ to $k + \Delta$ the odds ratio increases by $e^{\beta_j \Delta}$
  • If the coefficient is negative, then an increase in $x_j$ leads to a decrease in the odds ratio.

Confidence intervals for a single parameter $\beta_j$

Do I just need to use $1.96∗SE$? Or do I need to convert the SE using an approach described here?

Since the parameter $\beta_j$ is estimated using Maxiumum Likelihood Estimation, MLE theory tells us that it is asymptotically normal and hence we can use the large sample Wald confidence interval to get the usual

$$ \beta_j \pm z^* SE(\beta_j)$$

Which gives a confidence interval on the log-odds ratio. Using the invariance property of the MLE allows us to exponentiate to get $$ e^{\beta_j \pm z^* SE(\beta_j)}$$

which is a confidence interval on the odds ratio. Note that these intervals are for a single parameter only.

If I want to understand the standard-error for both variables how would I consider that?

If you include several parameters you can use the Bonferroni procedure, otherwise for all parameters you can use the confidence interval for probability estimates

Bonferroni procedure for several parameters

If $g$ parameters are to be estimated with family confidence coefficient of approximately $1 - \alpha$, the joint Bonferroni confidence limits are

$$ \beta_g \pm z_{(1 - \frac{\alpha}{2g})}SE(\beta_g)$$

Confidence intervals for probability estimates

The logistic model outputs an estimation of the probability of observing a one and we aim to construct a frequentist interval around the true probability $p$ such that $Pr(p_{L} \leq p \leq p_{U}) = .95$

One approach called endpoint transformation does the following:

  • Compute the upper and lower bounds of the confidence interval for the linear combination $x^T\beta$ (using the Wald CI)
  • Apply a monotonic transformation to the endpoints $F(x^T\beta)$ to obtain the probabilities.

Since $Pr(x^T\beta) = F(x^T\beta)$ is a monotonic transformation of $x^T\beta$

$$ [Pr(x^T\beta)_L \leq Pr(x^T\beta) \leq Pr(x^T\beta)_U] = [F(x^T\beta)_L \leq F(x^T\beta) \leq F(x^T\beta)_U] $$

Concretely this means computing $\beta^Tx \pm z^* SE(\beta^Tx)$ and then applying the logit transform to the result to get the lower and upper bounds:

$$[\frac{e^{x^T\beta - z^* SE(x^T\beta)}}{1 + e^{x^T\beta - z^* SE(x^T\beta)}}, \frac{e^{x^T\beta + z^* SE(x^T\beta)}}{1 + e^{x^T\beta + z^* SE(x^T\beta)}},] $$

The estimated approximate variance of $x^T\beta$ can be calculated using the covariance matrix of the regression coefficients using

$$ Var(x^T\beta) = x^T \Sigma x$$

The advantage of this method is that the bounds cannot be outside the range $(0,1)$

There are several other approaches as well, using the delta method, bootstrapping etc.. which each have their own assumptions, advantages and limits.


Sources and info

My favorite book on this topic is "Applied Linear Statistical Models" by Kutner, Neter, Li, Chapter 14

Otherwise here are a few online sources:

Edit October 2021 - New links

Related Question