Solved – Interpreting odds ratios

odds-ratior

I have the following set of results for one of the factors (birth weight) with different levels and their corresponding Odds ratios for survival. I am using the first level (<1.25) as the reference level:

Birth weight (kg):

Levels      Number/level    Odds Ratio
<1.25       1615            1.00
1.25-1.46   1617            1.37
1.46-1.65   1462            1.25
1.65-1.87   1632            1.68
>1.87       1466            2.35

From this result, I am trying to estimate, by using the OR, the approximate number of newborns that would survive in each level.

Are there ways that this can be achieved using R?

In a separate calculation, the actual number of survival per level are shown below:

Levels      Number/level    Odds Ratio    Actual number survived
<1.25       1615            1.00         1088
1.25-1.46   1617            1.37         1346
1.46-1.65   1462            1.25         1238
1.65-1.87   1632            1.68         1447
>1.87       1466            2.35         1351

EDIT:
The above is just one of the independent factors from my model.

mod <- glm(formula = surv ~ as.factor(var1) + as.factor(var2)+...as.factor(varn)+family = binomial(link = "logit"), data=mydf)


summary(mod)

glm(formula = surv ~ as.factor(season) + as.factor(bwt5) + as.factor(prectem5) + 
    as.factor(pcscore) + as.factor(pindx5) + as.factor(presp2) + 
    as.factor(ppscore) + as.factor(mtone2) + as.factor(fos) + 
    as.factor(psex) + as.factor(pscolor) + as.factor(pshiv) + 
    as.factor(backfat5) + as.factor(srect2) + as.factor(gest3) + 
    as.factor(int3) + as.factor(agit) + as.factor(tacc), family = binomial(link = "logit"), 
    data = lesna)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.0562   0.3120   0.4478   0.5929   1.9412  

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)           1.441842   0.290536   4.963 6.95e-07 ***
as.factor(season)2   -1.064053   0.107666  -9.883  < 2e-16 ***
as.factor(bwt5)2      0.314332   0.099776   3.150 0.001631 ** 
as.factor(bwt5)3      0.223824   0.110566   2.024 0.042935 *  
as.factor(bwt5)4      0.524586   0.120182   4.365 1.27e-05 ***
as.factor(bwt5)5      0.854196   0.138993   6.146 7.97e-10 ***
as.factor(prectem5)2  0.745025   0.094238   7.906 2.66e-15 ***
as.factor(prectem5)3  0.856777   0.098326   8.714  < 2e-16 ***
as.factor(prectem5)4  0.997219   0.111529   8.941  < 2e-16 ***
as.factor(prectem5)5  0.930925   0.120052   7.754 8.88e-15 ***
as.factor(pcscore)2   0.384534   0.137564   2.795 0.005185 ** 
as.factor(pcscore)3   0.668390   0.154608   4.323 1.54e-05 ***
as.factor(pindx5)2    0.243485   0.101755   2.393 0.016718 *  
as.factor(pindx5)3    0.262779   0.108809   2.415 0.015733 *  
as.factor(pindx5)4    0.595672   0.118124   5.043 4.59e-07 ***
as.factor(pindx5)5    0.467277   0.120401   3.881 0.000104 ***
as.factor(presp2)2   -0.286214   0.126012  -2.271 0.023127 *  
as.factor(ppscore)2  -0.246369   0.093568  -2.633 0.008462 ** 
as.factor(mtone2)2   -0.482397   0.118218  -4.081 4.49e-05 ***
as.factor(fos)2      -0.255652   0.075749  -3.375 0.000738 ***
as.factor(psex)2      0.182437   0.066964   2.724 0.006442 ** 
as.factor(pscolor)2  -0.694197   0.282069  -2.461 0.013852 *  
as.factor(pshiv)2    -0.241515   0.080792  -2.989 0.002796 ** 
as.factor(backfat5)2 -0.309427   0.104176  -2.970 0.002976 ** 
as.factor(backfat5)3 -0.004152   0.108669  -0.038 0.969523    
as.factor(backfat5)4  0.013233   0.103491   0.128 0.898257    
as.factor(backfat5)5 -0.221935   0.104639  -2.121 0.033926 *  
as.factor(srect2)2   -0.236981   0.104962  -2.258 0.023960 *  
as.factor(gest3)2     0.207375   0.106065   1.955 0.050562 .  
as.factor(gest3)3     0.904959   0.191307   4.730 2.24e-06 ***
as.factor(int3)2     -0.204870   0.127674  -1.605 0.108573    
as.factor(int3)3     -1.271092   0.388924  -3.268 0.001082 ** 
as.factor(agit)2     -0.496856   0.157553  -3.154 0.001613 ** 
as.factor(agit)3     -0.360247   0.148520  -2.426 0.015284 *  
as.factor(tacc)2     -0.282180   0.090556  -3.116 0.001833 ** 
as.factor(tacc)3     -0.429249   0.082520  -5.202 1.97e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 7096.2  on 7791  degrees of freedom
Residual deviance: 6143.0  on 7756  degrees of freedom
AIC: 6215

Number of Fisher Scoring iterations: 5

exp(mod$coefficients) # odds ratios
         (Intercept)   as.factor(season)2     as.factor(bwt5)2 
           4.2284770            0.3450544            1.3693441 
    as.factor(bwt5)3     as.factor(bwt5)4     as.factor(bwt5)5 
           1.2508506            1.6897596            2.3494840 
as.factor(prectem5)2 as.factor(prectem5)3 as.factor(prectem5)4 
           2.1064944            2.3555559            2.7107340 
as.factor(prectem5)5  as.factor(pcscore)2  as.factor(pcscore)3 
           2.5368547            1.4689297            1.9510940 
  as.factor(pindx5)2   as.factor(pindx5)3   as.factor(pindx5)4 
           1.2756866            1.3005395            1.8142499 
  as.factor(pindx5)5   as.factor(presp2)2  as.factor(ppscore)2 
           1.5956440            0.7511016            0.7816335 
  as.factor(mtone2)2      as.factor(fos)2     as.factor(psex)2 
           0.6173022            0.7744113            1.2001382 
 as.factor(pscolor)2    as.factor(pshiv)2 as.factor(backfat5)2 
           0.4994756            0.7854368            0.7338673 
as.factor(backfat5)3 as.factor(backfat5)4 as.factor(backfat5)5 
           0.9958568            1.0133207            0.8009678 
  as.factor(srect2)2    as.factor(gest3)2    as.factor(gest3)3 
           0.7890065            1.2304445            2.4718305 
    as.factor(int3)2     as.factor(int3)3     as.factor(agit)2 
           0.8147530            0.2805250            0.6084405 
    as.factor(agit)3     as.factor(tacc)2     as.factor(tacc)3 
           0.6975039            0.7541382            0.6509975 

exp(coef(mod)) #exponentiated coefficients
exp(confint(mod)) # 95% CI

RUNNING THE SCRIPT WITHOUT type="response"

> (pred1s <- predict(mod,newdata=as.data.frame(with(lesna,list(season=1,bwt5=1,
+            prectem5=1,pcscore=1,pindx5=1,presp2=1,ppscore=1,mtone2=1 .... [TRUNCATED] 
       1 
1.441842

RUNNING THE SCRIPT WITH type="response":

> (pred1s <- predict(mod,newdata=as.data.frame(with(lesna,list(season=1,bwt5=1,
+            prectem5=1,pcscore=1,pindx5=1,presp2=1,ppscore=1,mtone2=1 .... [TRUNCATED] 
        1 
0.8087397

Best Answer

The fact that these are coefficients are represented entirely by factors in R means that the Intercept is the log-odds for the event, i.e log(the proportion with event / proportion without) for subjects who all have their factor values at the lowest level. We know that of the 1615 in level 1 of the factor under scrutiny, 1088 survived, although 1088/(1615-1088) (= odds of survival given factor =1) is not necessarily going to match up with exp(Intercept) since not all of those people also had the other factors at the lowest level. In fact the odds of survival (or it could be death, since it's not really been made clear what the coding for the events was) was quite different. At the best case of all factors being = 1 the odds of survival were:

exp( 4.2284770 )
[1] 68.61266

That's actually a pretty low odds for newborn survival, so this must have been a NICU study or something happening in a third world county. But its way higher than the odds for all children who had that value and any other values for the other predictors. (When looking at the single line of data it was neither ... it was on a different species.) In R the way to get a quick estimate of survival probability would be:

 (pred1s <- predict(mod, newdata=as.data.frame( with( lesna, 
   list( seas = 1 , 
        btw=1 , prectem5 =1 , 
        pcscore =1 , pindx5 =1 ,
        presp2= 1 ,   ppscore=  1,
        mtone2 = , fos =  , 
        psex =  1 , pscolor = 1, 
        pshiv =1 ,    backfat5= 1 , 
        srect2 =levels(srect2)[1] , gest3 =1 , 
        int3= 1 , agit= 1 , tacc =1 ) )
                               ), 
      type="response"  )
)        # the outer parens are to get a value to print when the assignment is made

You can the compare exp(Intercept) to that ( atypical ) prediction divided by (1-prediction) ... since that what odds are. So I hope it's becoming clear. You need to specify all of the factor level values at once to get a prediction. Or you need to create a synthetic cohort with a specific composition of all factor levels. You cannot take a single factor distribution and create a prediction from such a complex model unless you specify some sort of average value for all the other factors.

Edit after looking at the single line of data:

Some (most in fact) of those variable were not factors, (and I guess I should have recognized that), which means they do not have any levels in your dataframe, but do have levels in the model matrix. I had assumed that the levels would be attributes of the factor variable, but I was wrong. It's going to be easier to work on this model if the structure of your 'newdata' arguments given to predict have the same structure as the original dataframe and I would seriously consider making a copy and turning all those items into factors. But with the exception of "srect2" we can change all of those items to 1. at least under the assumption that that is the minimum value for each of those variables. If it's not, ... then you need to use the minimum value. Code edited.

response value calculated:

1.4418 is the log-odds ( the Intercept in the linear predictor fpr the baseline category)

odds = Pr(X=1)/(1-Pr(X=1)) :: definition

log(odds) = log(Pr(X=1)/(1-Pr(X=1)) ) = 1.4418 :: starting point

Solve for Pr(X=1) ... should be = the calculated "response" value.

exp(1.4418)*(1- Pr(X=1) ) = Pr(X=1)

exp(1.4418) = (1+exp(1.4418))* Pr(X=1)

Pr(X=1) = exp(1.4418) / (1+exp(1.4418))

> exp(1.4418) / (1+exp(1.4418))
[1] 0.8087332

All of the other levels need to have the Intercept added to get their correct linear predictors. The difference in log-odds, i.e. the coefficients, is directly equivalent to the ratio on the odds scale, hence the exp(coef) is a bunch of odds ratios.

{ log(x-y) = c  }    <=>      { x/y = exp(c) }

--