Solved – Calculating Odds Ratio within Regression (in R)

logisticmultiple regressionodds-ratiorregression

I am calculating a regression model for passing a test, where the independent variables are Age, Pencils and Animals. I am looking for Odds ratios, I'm very confused why it isn't working…

Lets's say I have a data frame:

The outcome variable (Y) is binary. For this example – Pass/Fail (1/0)

Outcome: Pass (1/0)
Independent: Age (1,2,3,4); Pencils (1,2,3,4,5); Animals (0,1,2,3)
logitModel <- glm( Pass ~ Age + Pencils + Animals, 
         data = DataLogitModel, family = "binomial"(link = "logit"), weights = wt)

I want to calculate odds ratios so that I have them within the categories:
For example:

                             Odds ratio          P            95% CI                        
Age          1                  1                                                 
             2                  1.12            0.005        1.09-1.15            
             3                  1.53            0.013        1.34-1.67            
             4                  1.73            0.004        1.65-1.88
Animals      1                   1
             2                  1.34            0.023        1.28-1.46
            etc and for Pencils too

And a similar table for animals and numbers of pencils, all relative to a baseline 1.

When I do my model currently all I can find is the odds ratio for the variable, not individual categories within the variable.

                                       2.5%                 97.5%
(Intercept)      0.36                  0.27                 0.45
Age              1.46                  1.42                 1.53
Animals          0.78                  0.55                 1.02
Pencils          1.33                  1.23                 1.39

Using:

exp(cbind(coef(logitModel), confint(logitModel)))

I also tried:

or_glm(data=DatalogitModel, model=logitModel, incr=list(Age=1, Animals=1, Pencils=1)) 

However this just gave a similar result, I think it's something to do with the variables that cause this problem.

If you could help I would be so thankful!!

Best Answer

R uses dummy coding for encoding the effects of each of the categorical variables included as predictors in your model, provided these variables are declared as factors prior to fitting the model:

YourData$Age <- factor(YourData$Age)
YourData$Pencils <- factor(YourData$Pencils)
YourData$Animals <- factor(YourData$Animals)

After fitting the model with these factors, when you produce the summary of the model, you should see that R includes k-1 lines of output for a factor with k categories. For example, if Age has k=4 categories, you might see 3 lines of output in your model summary - these may be labelled Age2, Age3 and Age4 (if the categories for Age are 1,2,3 and 4).

Currently, you are only seeing one line of output for Age because R treats it as a numerical variable, not as a factor. (Same for your other two predictor variables.)

With Age declared as a factor, R sets aside the first category as a reference category (not shown in the model output) and it then compares the remaining categories against the reference categories: 2 vs 1, 3 vs 1 and 4 vs 1.

If you exponentiate the model coefficients reported by R for the rows of output labelled something like Age2, Age3 and Age4, you'll get the odds ratios for the comparisons of the age categories 2 vs 1, 3 vs 1 and 4 vs 1 with respect to the odds of "success" (after adjusting for the effects of the other predictors in your model). "Success" means passing the test.