Solved – Which intercept R selects (binomial glm)

binomial distributiongeneralized linear modelinterceptr

I have a problem with an analysis.

I'm doing a binomial glm with two categorical factors that are loc and trat.

I do not understand how R deals with the intercept (what statistical explanation does R have to select the intercept it wants)? Because it uses the first factor as an intercept and it also compares the second factor with the intercept that has nothing to do with it.

y <- cbind(data1$fr,data1$fl-data1$fr)
loc1 <- as.factor(data1$loc)
trat1 <- as.factor(data1$trat)

m2 <- glm(y~loc1 + data1$comp + trat1, family=binomial,na.action=na.omit,data=data1)

summary(m2)

Call:
glm(formula = y ~ loc1 + data1$comp + trat1, family = binomial, 
    data = data1, na.action = na.omit)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4015  -0.9895  -0.4015  -0.1713   6.1668  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.20524    0.20315 -15.778  < 2e-16 ***
loc12         1.06667    0.18642   5.722 1.05e-08 ***
loc13         0.52656    0.19319   2.726 0.006419 ** 
loc14         0.69228    0.21151   3.273 0.001064 ** 
data1$comp    0.21967    0.06314   3.479 0.000503 ***
trat1anemo   -4.78819    1.00885  -4.746 2.07e-06 ***
trat1autogam -3.75418    0.59252  -6.336 2.36e-10 ***
trat1autopol -1.28546    0.23312  -5.514 3.51e-08 ***
trat1control  0.49978    0.14277   3.501 0.000464 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 832.26  on 171  degrees of freedom
Residual deviance: 327.39  on 163  degrees of freedom
AIC: 565.92

Number of Fisher Scoring iterations: 7

Maybe someone here could help me?

Best Answer

R orders factor variables alphabetically by default. If you want a specific group to be a referent (baseline) group, then you should explicitly tell R. Let us see this with an example using a simulated variable mimicking your variable $loc1$:

set.seed(105) #Just setting seed to obtain reproducible results
loc1 <- as.factor(sample(c("loc11","loc12","loc13","loc14"),size = 10,replace = T))
loc1
 [1] loc11 loc14 loc12 loc12 loc13 loc14 loc14 loc14 loc13 loc11
Levels: loc11 loc12 loc13 loc14

#Now let us change the reference level
new_loc1 <- relevel(loc1,ref = "loc14") # Here, I declare "loc14"  to be the reference variable

new_loc1
 [1] loc11 loc14 loc12 loc12 loc13 loc14 loc14 loc14 loc13 loc11
Levels: loc14 loc11 loc12 loc13

Note the change in the reference variable with the $new\_loc1$ variable.

The interpretation of the intercept is: the log-odds of the outcome for the reference group of loc1 and trat1 when $data1\$comp=0$. If you exponentiate the intercept, i.e. $e^{-3.20524}=0.041$, you will get the odds of the outcome for the reference group of $loc1$ and $trat1$ when $data1\$comp=0$. If $data1\$comp$ variable never takes the value of zero, then, the intercept may not have meaningful interpretation. For further lesson on working with factor variables please refer here and for further lessons on interpretation of categorical predictors, please refer here.