Solved – R – How are the significance codes determined when summarizing a logistic regression model

logisticp-valuerstatistical significance

I'm doing research into understand the influential factors within a logistic regression model I've built in R using the glm() function.

From my research, it seems that using the summary() function to summarize the model is a popular method to identify which variables are significant. What I can't seem to find however is a description of what the summary function is doing to determine the significance codes (eg. the *) for each variable. This answer states that the significance codes are simply categorizations of the p-value, but I don't really understand that.

Is there anyone out there that could maybe help me understand how R computes this?

Best Answer

Firstly, the z or t value (depending on what family you run) is the coefficient divided by the standard error. The p value is then derived from the normal or t distributions using this z or t value.

The stars don't really add much in my view. You will see underneath the table of coefficients that there is a line which starts 'Signif. codes'. This gives the key. So a coefficient marked *** is one whose p value < 0.001. One whose coefficient is marked ** is p < 0.01. And so on.

For example (taken from https://stats.idre.ucla.edu/r/dae/logit-regression/):

mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
summary(mylogit)

Gives the following output:

Call:
glm(formula = admit ~ gre + gpa + rank, family = "binomial", 
    data = mydata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6268  -0.8662  -0.6388   1.1490   2.0790  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.989979   1.139951  -3.500 0.000465 ***
gre          0.002264   0.001094   2.070 0.038465 *  
gpa          0.804038   0.331819   2.423 0.015388 *  
rank2       -0.675443   0.316490  -2.134 0.032829 *  
rank3       -1.340204   0.345306  -3.881 0.000104 ***
rank4       -1.551464   0.417832  -3.713 0.000205 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 499.98  on 399  degrees of freedom
Residual deviance: 458.52  on 394  degrees of freedom
AIC: 470.52

Number of Fisher Scoring iterations: 4

You can see that gre has a p value = 0.038. This has one asterisk by it because that is < 0.05. rank4 has a p value = 0.0002 and so has three asterisks because this is < 0.001.

I just use the asterisks to quickly scan the table but I never look at them beyond that.

Best Answer

Related Solutions

Solved – How to obtain null and residual deviance/degrees of freedom for assessing model significance

Logistic Regression Accuracy – How to Determine the Accuracy of Logistic Regression in R

Related Question