Solved – Calculating Odds Ratio within Regression (in R)

logisticmultiple regressionodds-ratiorregression

I am calculating a regression model for passing a test, where the independent variables are Age, Pencils and Animals. I am looking for Odds ratios, I'm very confused why it isn't working…

Lets's say I have a data frame:

The outcome variable (Y) is binary. For this example – Pass/Fail (1/0)

Outcome: Pass (1/0)
Independent: Age (1,2,3,4); Pencils (1,2,3,4,5); Animals (0,1,2,3)
logitModel <- glm( Pass ~ Age + Pencils + Animals, 
         data = DataLogitModel, family = "binomial"(link = "logit"), weights = wt)

I want to calculate odds ratios so that I have them within the categories:
For example:

                             Odds ratio          P            95% CI                        
Age          1                  1                                                 
             2                  1.12            0.005        1.09-1.15            
             3                  1.53            0.013        1.34-1.67            
             4                  1.73            0.004        1.65-1.88
Animals      1                   1
             2                  1.34            0.023        1.28-1.46
            etc and for Pencils too

And a similar table for animals and numbers of pencils, all relative to a baseline 1.

When I do my model currently all I can find is the odds ratio for the variable, not individual categories within the variable.

                                       2.5%                 97.5%
(Intercept)      0.36                  0.27                 0.45
Age              1.46                  1.42                 1.53
Animals          0.78                  0.55                 1.02
Pencils          1.33                  1.23                 1.39

Using:

exp(cbind(coef(logitModel), confint(logitModel)))

I also tried:

or_glm(data=DatalogitModel, model=logitModel, incr=list(Age=1, Animals=1, Pencils=1))

However this just gave a similar result, I think it's something to do with the variables that cause this problem.

If you could help I would be so thankful!!

Best Answer

R uses dummy coding for encoding the effects of each of the categorical variables included as predictors in your model, provided these variables are declared as factors prior to fitting the model:

YourData$Age <- factor(YourData$Age)
YourData$Pencils <- factor(YourData$Pencils)
YourData$Animals <- factor(YourData$Animals)

After fitting the model with these factors, when you produce the summary of the model, you should see that R includes k-1 lines of output for a factor with k categories. For example, if Age has k=4 categories, you might see 3 lines of output in your model summary - these may be labelled Age2, Age3 and Age4 (if the categories for Age are 1,2,3 and 4).

Currently, you are only seeing one line of output for Age because R treats it as a numerical variable, not as a factor. (Same for your other two predictor variables.)

With Age declared as a factor, R sets aside the first category as a reference category (not shown in the model output) and it then compares the remaining categories against the reference categories: 2 vs 1, 3 vs 1 and 4 vs 1.

If you exponentiate the model coefficients reported by R for the rows of output labelled something like Age2, Age3 and Age4, you'll get the odds ratios for the comparisons of the age categories 2 vs 1, 3 vs 1 and 4 vs 1 with respect to the odds of "success" (after adjusting for the effects of the other predictors in your model). "Success" means passing the test.

Related Solutions

Logistic Regression in R – Understanding Odds Ratio

if you want to interpret the estimated effects as relative odds ratios, just do exp(coef(x)) (gives you $e^\beta$, the multiplicative change in the odds ratio for $y=1$ if the covariate associated with $\beta$ increases by 1). For profile likelihood intervals for this quantity, you can do

require(MASS)
exp(cbind(coef(x), confint(x)))

EDIT: @caracal was quicker...

Solved – Calculating risk ratio using odds ratio from logistic regression coefficient

Zhang 1998 originally presented a method for calculating CIs for risk ratios suggesting you could use the lower and upper bounds of the CI for the odds ratio.

This method does not work, it is biased and generally produces anticonservative (too tight) estimates of the risk ratio 95% CI. This is because of the correlation between the intercept term and the slope term as you correctly allude to. If the odds ratio tends towards its lower value in the CI, the intercept term increases to account for a higher overall prevalence in those with a 0 exposure level and conversely for a higher value in the CI. Each of these respectively lead to lower and higher bounds for the CI.

To answer your question outright, you need a knowledge of the baseline prevalence of the outcome to obtain correct confidence intervals. Data from case-control studies would rely on other data to inform this.

Alternately, you can use the delta method if you have the full covariance structure for the parameter estimates. An equivalent parametrization for the OR to RR transformation (having binary exposure and a single predictor) is:

$$RR = \frac{1 + \exp(-\beta_0)}{1+\exp(-\beta_0-\beta_1)}$$

And using multivariate delta method, and the central limit theorem which states that $\sqrt{n} \left( [\hat{\beta}_0, \hat{\beta}_1] - [\beta_0, \beta_1]\right) \rightarrow_D \mathcal{N} \left(0, \mathcal{I}^{-1}(\beta)\right)$, you can obtain the variance of the approximate normal distribution of the $RR$.

Note, notationally this only works for binary exposure and univariate logistic regression. There are some simple R tricks that make use of the delta method and marginal standardization for continuous covariates and other adjustment variables. But for brevity I'll not discuss that here.

However, there are several ways to compute relative risks and its standard error directly from models in R. Two examples of this below:

x <- sample(0:1, 100, replace=T)
y <- rbinom(100, 1, x*.2+.2)
glm(y ~ x, family=binomial(link=log))
library(survival)
coxph(Surv(time=rep(1,100), event=y) ~ x)

http://research.labiomed.org/Biostat/Education/Case%20Studies%202005/Session4/ZhangYu.pdf

Best Answer

Related Solutions

Logistic Regression in R – Understanding Odds Ratio

Solved – Calculating risk ratio using odds ratio from logistic regression coefficient

Related Question