I have the following set of results for one of the factors (birth weight) with different levels and their corresponding Odds ratios for survival. I am using the first level (<1.25) as the reference level:
Birth weight (kg):
Levels Number/level Odds Ratio
<1.25 1615 1.00
1.25-1.46 1617 1.37
1.46-1.65 1462 1.25
1.65-1.87 1632 1.68
>1.87 1466 2.35
From this result, I am trying to estimate, by using the OR, the approximate number of newborns that would survive in each level.
Are there ways that this can be achieved using R?
In a separate calculation, the actual number of survival per level are shown below:
Levels Number/level Odds Ratio Actual number survived
<1.25 1615 1.00 1088
1.25-1.46 1617 1.37 1346
1.46-1.65 1462 1.25 1238
1.65-1.87 1632 1.68 1447
>1.87 1466 2.35 1351
EDIT:
The above is just one of the independent factors from my model.
mod <- glm(formula = surv ~ as.factor(var1) + as.factor(var2)+...as.factor(varn)+family = binomial(link = "logit"), data=mydf)
summary(mod)
glm(formula = surv ~ as.factor(season) + as.factor(bwt5) + as.factor(prectem5) +
as.factor(pcscore) + as.factor(pindx5) + as.factor(presp2) +
as.factor(ppscore) + as.factor(mtone2) + as.factor(fos) +
as.factor(psex) + as.factor(pscolor) + as.factor(pshiv) +
as.factor(backfat5) + as.factor(srect2) + as.factor(gest3) +
as.factor(int3) + as.factor(agit) + as.factor(tacc), family = binomial(link = "logit"),
data = lesna)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.0562 0.3120 0.4478 0.5929 1.9412
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.441842 0.290536 4.963 6.95e-07 ***
as.factor(season)2 -1.064053 0.107666 -9.883 < 2e-16 ***
as.factor(bwt5)2 0.314332 0.099776 3.150 0.001631 **
as.factor(bwt5)3 0.223824 0.110566 2.024 0.042935 *
as.factor(bwt5)4 0.524586 0.120182 4.365 1.27e-05 ***
as.factor(bwt5)5 0.854196 0.138993 6.146 7.97e-10 ***
as.factor(prectem5)2 0.745025 0.094238 7.906 2.66e-15 ***
as.factor(prectem5)3 0.856777 0.098326 8.714 < 2e-16 ***
as.factor(prectem5)4 0.997219 0.111529 8.941 < 2e-16 ***
as.factor(prectem5)5 0.930925 0.120052 7.754 8.88e-15 ***
as.factor(pcscore)2 0.384534 0.137564 2.795 0.005185 **
as.factor(pcscore)3 0.668390 0.154608 4.323 1.54e-05 ***
as.factor(pindx5)2 0.243485 0.101755 2.393 0.016718 *
as.factor(pindx5)3 0.262779 0.108809 2.415 0.015733 *
as.factor(pindx5)4 0.595672 0.118124 5.043 4.59e-07 ***
as.factor(pindx5)5 0.467277 0.120401 3.881 0.000104 ***
as.factor(presp2)2 -0.286214 0.126012 -2.271 0.023127 *
as.factor(ppscore)2 -0.246369 0.093568 -2.633 0.008462 **
as.factor(mtone2)2 -0.482397 0.118218 -4.081 4.49e-05 ***
as.factor(fos)2 -0.255652 0.075749 -3.375 0.000738 ***
as.factor(psex)2 0.182437 0.066964 2.724 0.006442 **
as.factor(pscolor)2 -0.694197 0.282069 -2.461 0.013852 *
as.factor(pshiv)2 -0.241515 0.080792 -2.989 0.002796 **
as.factor(backfat5)2 -0.309427 0.104176 -2.970 0.002976 **
as.factor(backfat5)3 -0.004152 0.108669 -0.038 0.969523
as.factor(backfat5)4 0.013233 0.103491 0.128 0.898257
as.factor(backfat5)5 -0.221935 0.104639 -2.121 0.033926 *
as.factor(srect2)2 -0.236981 0.104962 -2.258 0.023960 *
as.factor(gest3)2 0.207375 0.106065 1.955 0.050562 .
as.factor(gest3)3 0.904959 0.191307 4.730 2.24e-06 ***
as.factor(int3)2 -0.204870 0.127674 -1.605 0.108573
as.factor(int3)3 -1.271092 0.388924 -3.268 0.001082 **
as.factor(agit)2 -0.496856 0.157553 -3.154 0.001613 **
as.factor(agit)3 -0.360247 0.148520 -2.426 0.015284 *
as.factor(tacc)2 -0.282180 0.090556 -3.116 0.001833 **
as.factor(tacc)3 -0.429249 0.082520 -5.202 1.97e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 7096.2 on 7791 degrees of freedom
Residual deviance: 6143.0 on 7756 degrees of freedom
AIC: 6215
Number of Fisher Scoring iterations: 5
exp(mod$coefficients) # odds ratios
(Intercept) as.factor(season)2 as.factor(bwt5)2
4.2284770 0.3450544 1.3693441
as.factor(bwt5)3 as.factor(bwt5)4 as.factor(bwt5)5
1.2508506 1.6897596 2.3494840
as.factor(prectem5)2 as.factor(prectem5)3 as.factor(prectem5)4
2.1064944 2.3555559 2.7107340
as.factor(prectem5)5 as.factor(pcscore)2 as.factor(pcscore)3
2.5368547 1.4689297 1.9510940
as.factor(pindx5)2 as.factor(pindx5)3 as.factor(pindx5)4
1.2756866 1.3005395 1.8142499
as.factor(pindx5)5 as.factor(presp2)2 as.factor(ppscore)2
1.5956440 0.7511016 0.7816335
as.factor(mtone2)2 as.factor(fos)2 as.factor(psex)2
0.6173022 0.7744113 1.2001382
as.factor(pscolor)2 as.factor(pshiv)2 as.factor(backfat5)2
0.4994756 0.7854368 0.7338673
as.factor(backfat5)3 as.factor(backfat5)4 as.factor(backfat5)5
0.9958568 1.0133207 0.8009678
as.factor(srect2)2 as.factor(gest3)2 as.factor(gest3)3
0.7890065 1.2304445 2.4718305
as.factor(int3)2 as.factor(int3)3 as.factor(agit)2
0.8147530 0.2805250 0.6084405
as.factor(agit)3 as.factor(tacc)2 as.factor(tacc)3
0.6975039 0.7541382 0.6509975
exp(coef(mod)) #exponentiated coefficients
exp(confint(mod)) # 95% CI
RUNNING THE SCRIPT WITHOUT type="response"
> (pred1s <- predict(mod,newdata=as.data.frame(with(lesna,list(season=1,bwt5=1,
+ prectem5=1,pcscore=1,pindx5=1,presp2=1,ppscore=1,mtone2=1 .... [TRUNCATED]
1
1.441842
RUNNING THE SCRIPT WITH type="response":
> (pred1s <- predict(mod,newdata=as.data.frame(with(lesna,list(season=1,bwt5=1,
+ prectem5=1,pcscore=1,pindx5=1,presp2=1,ppscore=1,mtone2=1 .... [TRUNCATED]
1
0.8087397
Best Answer
The fact that these are coefficients are represented entirely by factors in R means that the Intercept is the log-odds for the event, i.e log(the proportion with event / proportion without) for subjects who all have their factor values at the lowest level. We know that of the 1615 in level 1 of the factor under scrutiny, 1088 survived, although 1088/(1615-1088) (= odds of survival given factor =1) is not necessarily going to match up with exp(Intercept) since not all of those people also had the other factors at the lowest level. In fact the odds of survival (or it could be death, since it's not really been made clear what the coding for the events was) was quite different. At the best case of all factors being = 1 the odds of survival were:
That's actually a pretty low odds for newborn survival, so this must have been a NICU study or something happening in a third world county. But its way higher than the odds for all children who had that value and any other values for the other predictors. (When looking at the single line of data it was neither ... it was on a different species.) In R the way to get a quick estimate of survival probability would be:
You can the compare exp(Intercept) to that ( atypical ) prediction divided by (1-prediction) ... since that what odds are. So I hope it's becoming clear. You need to specify all of the factor level values at once to get a prediction. Or you need to create a synthetic cohort with a specific composition of all factor levels. You cannot take a single factor distribution and create a prediction from such a complex model unless you specify some sort of average value for all the other factors.
Edit after looking at the single line of data:
Some (most in fact) of those variable were not factors, (and I guess I should have recognized that), which means they do not have any levels in your dataframe, but do have levels in the model matrix. I had assumed that the levels would be attributes of the factor variable, but I was wrong. It's going to be easier to work on this model if the structure of your 'newdata' arguments given to
predict
have the same structure as the original dataframe and I would seriously consider making a copy and turning all those items into factors. But with the exception of "srect2" we can change all of those items to 1. at least under the assumption that that is the minimum value for each of those variables. If it's not, ... then you need to use the minimum value. Code edited.response value calculated:
1.4418 is the log-odds ( the Intercept in the linear predictor fpr the baseline category)
odds = Pr(X=1)/(1-Pr(X=1)) :: definition
log(odds) = log(Pr(X=1)/(1-Pr(X=1)) ) = 1.4418 :: starting point
Solve for Pr(X=1) ... should be = the calculated "response" value.
exp(1.4418)*(1- Pr(X=1) ) = Pr(X=1)
exp(1.4418) = (1+exp(1.4418))* Pr(X=1)
Pr(X=1) = exp(1.4418) / (1+exp(1.4418))
All of the other levels need to have the Intercept added to get their correct linear predictors. The difference in log-odds, i.e. the coefficients, is directly equivalent to the ratio on the odds scale, hence the exp(coef) is a bunch of odds ratios.
--