I'm doing research into understand the influential factors within a logistic regression model I've built in R using the glm() function.
From my research, it seems that using the summary() function to summarize the model is a popular method to identify which variables are significant. What I can't seem to find however is a description of what the summary function is doing to determine the significance codes (eg. the *) for each variable. This answer states that the significance codes are simply categorizations of the p-value, but I don't really understand that.
Is there anyone out there that could maybe help me understand how R computes this?
Best Answer
Firstly, the z or t value (depending on what family you run) is the coefficient divided by the standard error. The p value is then derived from the normal or t distributions using this z or t value.
The stars don't really add much in my view. You will see underneath the table of coefficients that there is a line which starts 'Signif. codes'. This gives the key. So a coefficient marked
***
is one whose p value < 0.001. One whose coefficient is marked**
is p < 0.01. And so on.For example (taken from https://stats.idre.ucla.edu/r/dae/logit-regression/):
Gives the following output:
You can see that
gre
has a p value = 0.038. This has one asterisk by it because that is < 0.05.rank4
has a p value = 0.0002 and so has three asterisks because this is < 0.001.I just use the asterisks to quickly scan the table but I never look at them beyond that.