Logistic Regression in R – How Does GLM Function with Binomial Family Work?

generalized linear modellogisticrregression

I would like to know how the glm function in R actually work. Wouldn't it be possible to make a logistic regression not on the raw columns of a dataset but on the four values you get in the contingency table (if you have two binary variables as outcome and predictor). So for Example, taking a random dataset which gives this contingency table:

      Right-handed   Left-handed    Total
Male            43             9       52 
Female          44             4       48 
Total           87            13      100

Is the glm function in R calculating them in the end or is it only working on the raw columns? Or is it important for the glm function to know which male is right-handed, which one left-handed and the same for the females?

Best Answer

The glm function works by optimizing the log likelihood for the binomial. I suggest you read up on most any book on glm if you are interested in learning more about how these models are fit.

That being said, it is possible to re-arrange the 2x2 table in such a way that glm can be used.

nums = c(43,44,9,4)

x = matrix(nums, nrow = 2)
colnames(x) = c('Right-handed', 'Left-handed')
rownames(x) = c(1,0) #Code the outcome as a binary variable.

# Turns the table into a dataframe
# The cell counts exist as a column called Freq
# We will Weight each row by the number of observations in that cell
d = as.data.frame.table(x)


model = glm(Var2~Var1, weights = Freq, family = binomial(), data=d)
summary(model)
#> 
#> Call:
#> glm(formula = Var2 ~ Var1, family = binomial(), data = d, weights = Freq)
#> 
#> Deviance Residuals: 
#>      1       2       3       4  
#> -4.043  -2.767   5.619   4.459  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -1.5640     0.3666  -4.267 1.98e-05 ***
#> Var10        -0.8339     0.6380  -1.307    0.191    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 77.277  on 3  degrees of freedom
#> Residual deviance: 75.452  on 2  degrees of freedom
#> AIC: 79.452
#> 
#> Number of Fisher Scoring iterations: 5

Created on 2021-09-09 by the reprex package (v2.0.1)

As to some of your questions:

Is the glm function in R calculating them in the end or is it only working on the raw columns?

glm works on raw columns and does not calculate 2x2 tables

Or is it important for the glm function to know which male is right-handed, which one left-handed and the same for the females?

If you have replicates (e.g. you have 10 right handed males) then you can use the weights argument to let glm know this, or you can repeat the row 10 times in your data.

Related Question