Solved – Reproducing SPSS GLM in R, changed coefficients

generalized linear modelrregressionspss

I'm trying to reproduce a generalized linear model in R. The original SPSS model has distribution Gamma, link Log, with a continuous dependent variable and all independent variables categorical.

In R this should be:

glm(dependent_variable~.,data=data_for_SPSS_input,family=Gamma(link='log'))

But the coefficient estimates here don't match the ones from the SPSS output (B values) and the significant variables aren't all the same either.

The SPSS model also gives tests of model effects for the original categorical variables (rather than just the dummy coded levels) using the Wald Chi-squared test. How do I obtain these in R?

How are 'deviance', 'scaled deviance' and so on in SPSS comparable to R's null deviance and residual deviance?

Appreciate any help at all.

Best Answer

Let's try below code in R and think contrasts methods. In default, R uses contr.treatment in treating unordered categorical variables. (It contrasts each level with the baseline level (alphabetical first); ?contr.treatment gives you details.)

In this example, baseline level is species setosa, so coefficients: (Intercept) and Sepal.Length are setosa's intercept and slope and their t value comes from the difference between setosa's intercept (or slope) and zero. (Intercept) + Speciesversicolor is versicolor's intercept, Sepal.Length + Sepal.Length:Speciesversicolor is its slope. Their t value comes from the difference between versicolor and setosa.

options()$contrasts  # return now options (I think your options are default.)

res1 <- glm(Petal.Length ~ Sepal.Length * Species, data=iris, family=gaussian)
# this is  equivalent to glm(Petal.Length ~ Sepal.Length * Species, data=iris, contrasts = list(Species="contr.treatment"), family=gaussian)

res2 <- glm(Petal.Length ~ Sepal.Length * Species, data=iris, contrasts = list(Species="contr.sum"), family=gaussian)

summary(res1)  # contr.treatment (Base level is Species setosa)
# Coefficients:
#                                 Estimate Std. Error t value Pr(>|t|)    
# (Intercept)                      0.8031     0.5310   1.512    0.133      Setosa's intercept
# Sepal.Length                     0.1316     0.1058   1.244    0.216      Setosa's slope
# Speciesversicolor               -0.6179     0.6837  -0.904    0.368      Difference between Setosa's intercept and versicolor's
# Speciesvirginica                -0.1926     0.6578  -0.293    0.770    
# Sepal.Length:Speciesversicolor   0.5548     0.1281   4.330 2.78e-05 ***  Difference between Setosa's slope and versicolor's
# Sepal.Length:Speciesvirginica    0.6184     0.1210   5.111 1.00e-06 ***

# If you want versicolor's Coefficients,
res1$coefficients[1] + res1$coefficients[3]   # versicolor's intercept
res1$coefficients[2] + res1$coefficients[5]   # versicolor's slope


summary(res2)  # contr.sum
# Coefficients:
#                       Estimate Std. Error t value Pr(>|t|)    
# (Intercept)            0.53288    0.26206   2.033   0.0439 *    mean of three species' intercepts
# Sepal.Length           0.52273    0.04698  11.127  < 2e-16 ***  mean of three species' slopes
# Species1               0.27017    0.40333   0.670   0.5040    
# Species2              -0.34776    0.36121  -0.963   0.3373    
# Sepal.Length:Species1 -0.39110    0.07707  -5.075 1.18e-06 ***
# Sepal.Length:Species2  0.16374    0.06283   2.606   0.0101 *  

res2$coef[1] + res2$coef[3]  # setosa's intercept
res2$coef[1] + res2$coef[4]  # versicolor's intercept
res2$coef[1] - res2$coef[3] - res2$coef[4]  # virginica's intercept


# tests of model effects for the original categorical variables
summary(aov(res1))       # Type I Anova (This is inappropriate because this data isn't orthogonal)

# When your model doesn't have an interaction term, type II Anova is equivalent to type III (this example model has an interaction term).
library(car)
Anova(res1, type="II")
Anova(res2, type="II")   # These are equivalent. Type II Anova doesn't depend on methods of contrasts.

Anova(res1, type="III")  # These aren't equivalent !!
Anova(res2, type="III")  # When you use type III Anova, you must use contr.sum.


# tests of the models against one another
res1.2 <- glm(Petal.Length ~ Sepal.Length + Species, data=iris, family=gaussian)
# You needn't mind methods of contrasts and types of anova because of the same residual deviance.
anova(res1, res1.2)    # not "A"nova