Solved – Level of factor taken as intercept

binomial distributiondatasetgeneralized linear modelr

I am using a GLM to analyze binomial data from one factor (Group) with three levels: Control, Control Treatment and Treatment.

m3 <- glm(Survive ~ Group, family=binomial, data=dat2)
summary(m3)

when analyzing however, the model has taken Control as intercept, I'm not sure why this is. Also in previous analyses with GLMs I have never seen levels of a factor presented separately in the summary:

Call:
glm(formula = Survive ~ Group, family = binomial, data = dat2)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.354  -1.177   0.000   1.177   1.354  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)
(Intercept)      0.4055     0.6455   0.628    0.530
GroupCtrl Trt   -0.4055     0.9037  -0.449    0.654
GroupTreatment  -0.8109     0.9129  -0.888    0.374

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 41.589  on 29  degrees of freedom
Residual deviance: 40.783  on 27  degrees of freedom
AIC: 46.783

Number of Fisher Scoring iterations: 4

Edit 1:
Normally in the summary I would see an intercept and then a factor, the separate levels I would only see in a post-hoc multiple comparison. My data collection consists of two collumns, one is treatment (Ctrl,Ctrl Trt, Treatment) the other is binary data: 1 for survival and 0 for loss.

NEST = nest id, not used in this analysis.
> str(dat2)
'data.frame': 30 obs. of 3 variables:
$ NEST : num 3 6 9 12 15 18 21 24 27 30 ...
$ Group : Factor w/ 3 levels "Control","Ctrl Trt",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Survive: num 1 1 0 0 0 1 0 1 1 1 ...

I do not desire to omit the intercept, I'm confused as well as how this could happen.

Edit 2: adding + 0 to the model

Call:
glm(formula = Survive ~ Group + 0, family = binomial, data = dat2)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.354  -1.177   0.000   1.177   1.354  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)
GroupControl     0.4055     0.6455   0.628     0.53
GroupCtrl Trt    0.0000     0.6325   0.000     1.00
GroupTreatment  -0.4055     0.6455  -0.628     0.53

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 41.589  on 30  degrees of freedom
Residual deviance: 40.783  on 27  degrees of freedom
AIC: 46.783

Number of Fisher Scoring iterations: 4

Edit 3: The 30 nests were observed in two series, I'd like to add this as factor

'data.frame':   30 obs. of  4 variables:
 $ NEST   : Factor w/ 30 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Group  : Factor w/ 3 levels "Control","Ctrl Trt",..: 3 2 1 3 2 1 3 2 1 3 ...
 $ Survive: num  1 1 1 0 1 1 0 1 0 0 ...
 $ Series : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...

However, when adding this factor to the model, including the + 0, I get confusing results again, it doesn't include Series1 for instance:

Call:
glm(formula = Survive ~ Group * Series + 0, family = binomial, 
    data = dat1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7941  -0.6681   0.0000   0.6681   1.7941  

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)  
GroupControl           -4.055e-01  9.129e-01  -0.444    0.657  
GroupCtrl Trt           1.386e+00  1.118e+00   1.240    0.215  
GroupTreatment         -1.386e+00  1.118e+00  -1.240    0.215  
Series2                 1.792e+00  1.443e+00   1.241    0.214  
GroupCtrl Trt:Series2  -4.564e+00  2.141e+00  -2.132    0.033 *
GroupTreatment:Series2  1.133e-15  2.041e+00   0.000    1.000  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 41.589  on 30  degrees of freedom
Residual deviance: 33.476  on 24  degrees of freedom
AIC: 45.476

Number of Fisher Scoring iterations: 4

Best Answer

That's how "treatment contrasts" work. One column of the model matrix is taken by the "first" factor in that simple model. Each statistical system chooses a default contrast strategy, so R's is different that SAS or SPSS. If the model were more complex with multiple factor predictors, then the "Intercept" would apply to the cases who all had the base-level of the various factors. If there were continuous covariates then the intercept would be the predicted "effect" for a hypothetical case with all factors at the base level and all continuous predictors at zero. (Obviously this might not be a physically interpretable scenario.) You could in this instance use a different formula to get the labeling as you expected with:

glm(formula = Survive ~ Group + 0, family = binomial, data = dat2)

This is my attempt to reconstruct the results of that call:

Call: glm(formula = Survive ~ Group + 0, family = binomial, data = dat2) 
Deviance Residuals: Min    1Q   Median       3Q   Max
                 -1.354 -1.177   0.000    1.177 1.354 
Coefficients: Estimate Std. Error z value Pr(>|z|) 
GroupControl    0.4055     0.6455   0.628     0.53 
GroupCtrl Trt   0.0000     0.6325   0.000     1.00 
GroupTreatment -0.4055     0.6455  -0.628     0.53 
(Dispersion parameter for binomial family taken to be 1)

So that shows that the coefficient for the "Ctrl Trt"-Group was zero so that further implies an exactly 50% survival in that group. When you omit the Intercept in a single factor model, each of coefficients refer only to the log-odds for the individual factor levels. The "Treatment"-Group coefficient suggest that of a group of 10 subject that 4 out of 10 survived since exp(-.4055) [1] 0.6666434 is very close to 4/(10-4). And in your "Ctrl"-Group there was 6 out of ten survivors since exp(0.4055) [1] 1.500052 is very close to 6/(10-6). (Remembering that we are modeling odds, not probabilities.)

In general, it's better (as in less confusing to the "uninitiated") to not omit the intercept, but for a single factor model it can be helpful.

I'm actually having difficulty figuring out how you could have produced that particular result (two levels that have values whose absolute values are exactly equal to one-half of the value of the third level). I'm wondering if you have somehow duplicated cases? You should a) describe the data collection and b) post the output of str(dat2).

Related Question