Solved – Doing logistic regression using R

interpretationlogisticr

I need to do a logistic regression using R on my data. My response variable (y) is survival at weaning (surv=0; did not surv=1) and I have several independent variables which are binary and categoricals in nature.

I am following some examples on this website http://www.ats.ucla.edu/stat/r/dae/logit.htm and trying to run some models.

Running the model:

> mysurv2 <- glm(surv~as.factor(PTEM) + as.factor(pshiv) + as.factor(presp) + 
                 as.factor(pmtone), family=binomial(link="logit"), data=ap)
> summary(mysurv2)

Call:
glm(formula = surv ~ as.factor(PTEM) + as.factor(pshiv) + as.factor(presp) + 
    as.factor(pmtone), family = binomial(link = "logit"), data = ap)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2837  -0.5121  -0.5121  -0.5058   2.0590  

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -0.01135    0.23613  -0.048  0.96166    
as.factor(PTEM)2   -0.74642    0.24482  -3.049  0.00230 ** 
as.factor(PTEM)3   -1.95401    0.23259  -8.401  < 2e-16 ***
as.factor(pshiv)2  -0.02638    0.06784  -0.389  0.69738    
as.factor(presp)2   0.74549    0.10532   7.079 1.46e-12 ***
as.factor(presp)3   0.66793    0.66540   1.004  0.31547    
as.factor(pmtone)2  0.54699    0.09678   5.652 1.58e-08 ***
as.factor(pmtone)3  1.82337    0.75409   2.418  0.01561 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 7892.6  on 8791  degrees of freedom
Residual deviance: 7252.8  on 8784  degrees of freedom
  (341 observations deleted due to missingness)
AIC: 7268.8

Number of Fisher Scoring iterations: 4

Adding the na.action=na.pass at the end of the model gave me an error message. I thought that this would take care NA's in my independent variables.

> mysurv <- glm(surv~as.factor(PTEM) + as.factor(pshiv) + as.factor(presp) + 
                as.factor(pmtone), family=binomial(link="logit"), data=ap, 
                na.action=na.pass)
Error: NA/NaN/Inf in foreign function call (arg 1)

Since this is my first time to venture into logistic regression, I am wondering whether there is any package in R that would be more suitable?

I am also tryng to understand the regression coefficients. The independent variables used in the model are:

  1. rectal temperature:

    • (PTEM)1 = newborns with rectal temp. below 35.4 0C
    • (PTEM)2 = newborns with rectal temp. between 35.4 to 36.9 0C
    • (PTEM)3 = newborns with rectal temp. above 37.0 0C
  2. shivering:

    • (pshiv)1 = newborns that were not shivering
    • (pshiv)2 = newborns that were shivering
  3. respiration:

    • (presp)1 = newborns with normal respiration
    • (presp)2 = newborns with slight respiration problem
    • (presp)3 = newborns with poor respiration
  4. muscle tone:

    • (pmtone)1 = newborns with normal muscle tone
    • (pmtone)2 = newborns with moderate muscle tone
    • (pmtone)1 = newborns with poor muscle tone

Looking at the coefficients, I got the following:

                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -0.01135    0.23613  -0.048  0.96166    
as.factor(PTEM)2   -0.74642    0.24482  -3.049  0.00230 ** 
as.factor(PTEM)3   -1.95401    0.23259  -8.401  < 2e-16 ***
as.factor(pshiv)2  -0.02638    0.06784  -0.389  0.69738    
as.factor(presp)2   0.74549    0.10532   7.079 1.46e-12 ***
as.factor(presp)3   0.66793    0.66540   1.004  0.31547    
as.factor(pmtone)2  0.54699    0.09678   5.652 1.58e-08 ***
as.factor(pmtone)3  1.82337    0.75409   2.418  0.01561 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

In my other analysis, I found that newborns:

a) with higher rectal temperature
b) do not shiver
c) good respiration and
d) good muscle tone at birth were more likely to survive.

I am a bit confused with the coefficients I am getting above. I am wondering whether whether I am not interpreting the results correctly or is it something else?

Best Answer

I think you're confused because you defined survival at weaning as surv=0 rather than surv=1. In your model, negative coefficients indicate high odds of survival (low odds of surv=1).

Related Question