Logistic Regression Analysis – Can Confidence Intervals Be Calculated With Estimate, OR, and p Value?

confidence intervallogisticr

I have this data that is the result of a logistic regression.

df<-structure(list(`Predictors of failure` = c("Variable A", "Variable B", 
"Variable C", "Variable D", "Variable E", "Variable F"), Estimate = c(1.73, 
1.18, 1.59, -0.04, 0.16, -0.003), OR = c(3, 3.26, 4.88, 0.98, 
1.01, 1), `p-value` = c(0.049, 0.043, 0.025, 0.095, 0.763, 0.172
)), row.names = c(NA, -6L), spec = structure(list(cols = list(
    `Predictors of failure` = structure(list(), class = c("collector_character", 
    "collector")), Estimate = structure(list(), class = c("collector_double", 
    "collector")), OR = structure(list(), class = c("collector_double", 
    "collector")), `p-value` = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000002805bfca420>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

enter image description here

And unfortunately I don't believe we have access to the original "raw" data anymore that this regression was based on. I've been asked to provide the confidence intervals. Normally I would export these at the time of the regression, but since I don't have access to that anymore, I was looking for another route.

It seems like you MAY be able to calculate confidence intervals from a p-value but I apologize, I wasn't quite clear how I could apply this to the results of this regression.

Is it possible? And if so, could someone help me do it?

Best Answer

Yes it's possible. Wald confidence intervals in a logistic regression are calculated on the log-odds scale and then exponentiated to get the confidence interval for the corresponding odds ratio. The $p$-value is based on the test statistic $z = \frac{\hat{\beta}}{\operatorname{SE}_{\hat{\beta}}}$, which is the estimate on the log-odds scale divided by its standard error. The (two-sided) $p$-value is then calculated based on the test statistic $z$ as $p=2\Phi(-|z|)$, where $\Phi$ denotes the cdf of the standard normal distribution. In order to calculate the confidence interval, we need the standard error. Solving for the standard error, we have: $$ \operatorname{SE} = -\frac{|\hat{\beta}|}{\Phi^{-1}(p/2)} $$ where $\Phi^{-1}$ is the quantile function of the standard normal distribution. To calculate the confidence interval on the log-odds scale, we then use: $$ \hat{\beta} \pm z_{1-\alpha/2}\times \operatorname{SE}_{\hat{\beta}} $$ where $z_{1-\alpha/2}$ is a quantile of the standard distribution, e.g. $1.96$ for $\alpha = 0.05$ for a 95% confidence interval on the log-odds scale. Exponentiate the limits to get the confidence interval for the odds ratio.

Let's apply it to the estimate for variable D. Applying the formula, we recover the approximate standard error as $$ \operatorname{SE} = -\frac{|-0.040|}{-1.669593} = 0.024 $$ Hence, an approximate 95% confidence interval for the odds ratio is: $$ \operatorname{exp}(-0.040 \pm 1.96\times 0.024) $$ which evaluates to $(0.917, 1.010)$.

In R, you could do:

library(tidyverse)

df <- df %>%
  mutate(
    z = qnorm(`p-value`/2)
    , se = -abs(Estimate)/z
    , ci_lwr = exp(Estimate - qnorm(0.975)*se)
    , ci_upr = exp(Estimate + qnorm(0.975)*se)
  )
```
Related Question