Logistic – Differences in Output Between SAS’s PROC GENMOD and R’s GLM

generalized linear modellogisticrsas

I'm trying to replicate a model fitted in SAS in R but the fit I'm getting gives me slightly different coefficients and standard errors.

Data:

testdata <- data.frame(matrix(c("f","Test", 1.75,   16, 0,  16, 0,  1,  1,
"m",    "Test", 1.75,   15, 1,  16, 6.25,   1,  0,
"f",    "Test", 2.75,   4,  12, 16, 75, 1,  1,
"m",    "Test", 2.75,   9,  6,  15, 40, 1,  0,
"f",    "WHO",  1.75,   15, 1,  16, 6.25,   0,  1,
"m",    "WHO",  1.75,   14, 2,  16, 12.5,   0,  0,
"f",    "WHO",  2.75,   2,  13, 15, 86.6667,    0,  1,
"m",    "WHO",  2.75,   3,  13, 16, 81.25,  0,  0
), ncol=9, byrow=TRUE))
names(testdata) <- c("sex", "vaccine", "dose", "not_p", "para", "n", "pct", 
                     "vacnum", "sexno")

SAS:

proc genmod data=model_data;
class sex;
model para/n  = dose sex vacnum

  /dist=bin 
  link=logit
  type3;
run;

Analysis Of Maximum Likelihood Parameter Estimates 
Parameter   DF Estimate Std Error Wald 95% Conf Lim Wald Chi-Square Pr > ChiSq 
Intercept   1 -9.4020   1.6220    -12.5810  -6.2230  33.60           <.0001 
dose        1  3.9208   0.6460      2.6546   5.1870  36.83           <.0001 
sex f       1  0.5574   0.5184     -0.4587   1.5735   1.16           0.2823 
sex m       0  0.0000   0.0000      0.0000   0.0000    .              . 
vacnum      1 -1.3221   0.5483     -2.3967  -0.2475   5.81           0.0159 
Scale       0  1.0000   0.0000      1.0000   1.0000     

R:

testdata$sexno <- as.factor(testdata$sexno)    
a <- contr.treatment(2, base = 1, contrasts = TRUE)

contrasts(testdata$sexno) <- a

fitreduced <- glm(para/n ~ dose + as.factor(sex) + vacnum, 
                  family=quasibinomial(link="logit"), data=testdata)

coef(summary(fitreduced))

                  Estimate Std. Error   t value    Pr(>|t|)
(Intercept)     -9.4013750  1.7613982 -5.337450 0.005935450
dose             3.9173794  0.7001133  5.595351 0.005007179
as.factor(sex)1  0.5704671  0.5568436  1.024466 0.363525300
vacnum          -1.3336100  0.5887552 -2.265135 0.086189704

I believe I have the right contrasts to give me a type III SS but there is a small discrepency in values, have a missed something here?

Best Answer

I notice several things here.

First, when you enter your data via matrix, all the data have to be the same type. Thus, they are coerced to be the most inclusive type, strings, which in turn are coerced to be factors by default. Note:

testdata <- data.frame(matrix(c("f","Test", 1.75,   16, 0,  16, 0,  1,  1,
...
sapply(testdata, class)
#      sex  vaccine     dose    not_p     para        n      pct   vacnum    sexno
# "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor"

Try using read.table(text='...', sep=",") instead:

testdata <- read.table(text='"f", "Test", 1.75,   16,   0,  16,  0,      1,  1
"m", "Test", 1.75,   15,   1,  16,  6.25,   1,  0
"f", "Test", 2.75,    4,  12,  16, 75,      1,  1
"m", "Test", 2.75,    9,   6,  15, 40,      1,  0
"f", "WHO",  1.75,   15,   1,  16,  6.25,   0,  1
"m", "WHO",  1.75,   14,   2,  16, 12.5,    0,  0
"f", "WHO",  2.75,    2,  13,  15, 86.6667, 0,  1
"m", "WHO",  2.75,    3,  13,  16, 81.25,   0,  0', sep=",")
names(testdata) <- c("sex", "vaccine", "dose", "not_p", "para", "n", "pct", 
                     "vacnum", "sexno")
sapply(testdata, class)
#      sex   vaccine      dose     not_p      para         n       pct    vacnum 
# "factor"  "factor" "numeric" "integer" "integer" "integer" "numeric" "integer" 
#     sexno 
# "integer" 

(That was small potatoes.) The next trap to worry about is that SAS and R code logistic regression for binomial data differently. SAS uses "events over trials", but R uses the odds, successes/failures. Thus, your model formula should be:

form <- as.formula("cbind(para, n-para) ~ dose + sex + vacnum")

Finally, you specified family=quasibinomial (i.e., the quasibinomial) in your R code, but \DIST=BIN (i.e., the binomial) in your SAS code. To match the SAS output, use the binomial instead. Thus, your final model is:

fitreduced <- glm(form, family=binomial(link="logit"), data=testdata)
coef(summary(fitreduced))
#               Estimate Std. Error   z value     Pr(>|z|)
# (Intercept) -9.4020028  1.6219570 -5.796703 6.763131e-09
# dose         3.9207805  0.6460193  6.069138 1.285986e-09
# sexf         0.5574087  0.5184112  1.075225 2.822741e-01
# vacnum      -1.3221011  0.5482645 -2.411430 1.589012e-02

This seems to match the SAS estimates and standard errors.

Related Question