I need to do a logistic regression using R on my data. My response variable (y
) is survival at weaning (surv=0
; did not surv=1
) and I have several independent variables which are binary and categoricals in nature.
I am following some examples on this website http://www.ats.ucla.edu/stat/r/dae/logit.htm and trying to run some models.
Running the model:
> mysurv2 <- glm(surv~as.factor(PTEM) + as.factor(pshiv) + as.factor(presp) +
as.factor(pmtone), family=binomial(link="logit"), data=ap)
> summary(mysurv2)
Call:
glm(formula = surv ~ as.factor(PTEM) + as.factor(pshiv) + as.factor(presp) +
as.factor(pmtone), family = binomial(link = "logit"), data = ap)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2837 -0.5121 -0.5121 -0.5058 2.0590
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.01135 0.23613 -0.048 0.96166
as.factor(PTEM)2 -0.74642 0.24482 -3.049 0.00230 **
as.factor(PTEM)3 -1.95401 0.23259 -8.401 < 2e-16 ***
as.factor(pshiv)2 -0.02638 0.06784 -0.389 0.69738
as.factor(presp)2 0.74549 0.10532 7.079 1.46e-12 ***
as.factor(presp)3 0.66793 0.66540 1.004 0.31547
as.factor(pmtone)2 0.54699 0.09678 5.652 1.58e-08 ***
as.factor(pmtone)3 1.82337 0.75409 2.418 0.01561 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 7892.6 on 8791 degrees of freedom
Residual deviance: 7252.8 on 8784 degrees of freedom
(341 observations deleted due to missingness)
AIC: 7268.8
Number of Fisher Scoring iterations: 4
Adding the na.action=na.pass
at the end of the model gave me an error message. I thought that this would take care NA's in my independent variables.
> mysurv <- glm(surv~as.factor(PTEM) + as.factor(pshiv) + as.factor(presp) +
as.factor(pmtone), family=binomial(link="logit"), data=ap,
na.action=na.pass)
Error: NA/NaN/Inf in foreign function call (arg 1)
Since this is my first time to venture into logistic regression, I am wondering whether there is any package in R that would be more suitable?
I am also tryng to understand the regression coefficients. The independent variables used in the model are:
-
rectal temperature:
(PTEM)1
= newborns with rectal temp. below 35.4 0C(PTEM)2
= newborns with rectal temp. between 35.4 to 36.9 0C(PTEM)3
= newborns with rectal temp. above 37.0 0C
-
shivering:
(pshiv)1
= newborns that were not shivering(pshiv)2
= newborns that were shivering
-
respiration:
(presp)1
= newborns with normal respiration(presp)2
= newborns with slight respiration problem(presp)3
= newborns with poor respiration
-
muscle tone:
(pmtone)1
= newborns with normal muscle tone(pmtone)2
= newborns with moderate muscle tone(pmtone)1
= newborns with poor muscle tone
Looking at the coefficients, I got the following:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.01135 0.23613 -0.048 0.96166
as.factor(PTEM)2 -0.74642 0.24482 -3.049 0.00230 **
as.factor(PTEM)3 -1.95401 0.23259 -8.401 < 2e-16 ***
as.factor(pshiv)2 -0.02638 0.06784 -0.389 0.69738
as.factor(presp)2 0.74549 0.10532 7.079 1.46e-12 ***
as.factor(presp)3 0.66793 0.66540 1.004 0.31547
as.factor(pmtone)2 0.54699 0.09678 5.652 1.58e-08 ***
as.factor(pmtone)3 1.82337 0.75409 2.418 0.01561 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In my other analysis, I found that newborns:
a) with higher rectal temperature
b) do not shiver
c) good respiration and
d) good muscle tone at birth were more likely to survive.
I am a bit confused with the coefficients I am getting above. I am wondering whether whether I am not interpreting the results correctly or is it something else?
Best Answer
I think you're confused because you defined survival at weaning as surv=0 rather than surv=1. In your model, negative coefficients indicate high odds of survival (low odds of surv=1).