Solved – Calculating total estimated degrees of freedom for a GAM

degrees of freedomgeneralized-additive-model

I am trying to figure out how to calculate the total edf for a GAM. The model output is:

Family: poisson 
Link function: log 

Formula:
TotAbund ~ s(MeanDepth, by = fPeriod) + fPeriod + offset(LogSA)

Parametric coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.82135    0.01003 -580.39   <2e-16 ***
fPeriod2    -0.49759    0.02043  -24.36   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                         edf Ref.df Chi.sq p-value    
s(MeanDepth):fPeriod1 8.887  8.995  13849  <2e-16 ***
s(MeanDepth):fPeriod2 8.891  8.995   4697  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =   0.35   Deviance explained = 73.7%
UBRE =   81.8  Scale est. = 1         n = 147

I understand that the edf for each smoothing spline is listed in the "edf" column. However, to calculate the total model estimated degrees of freedom, would I just sum the edf for each smoother? (8.887 + 8.891 = 17.778)

Best Answer

The correct terminology for the degrees of freedom that you need to compute is model degrees of freedom. You could also compute residual degrees of freedom.

The model degrees of freedom are indeed calculated by adding up the degrees of freedom used by the parametric and non-parametric (or smooth) terms in your model.

Here is an example of gam model for which you can check the computation of model degrees of freedom:

library(mgcv) 

set.seed(6)
dat <- gamSim(1,n=400,dist="poisson",scale=.1)

m <-gam(y~s(x0)+s(x1)+s(x2)+ s(x3),family=poisson,data=dat,method="REML")

plot(m,pages=1)

summary(m)

The output produced by this model is as follows:

Family: poisson 
Link function: log 

Formula:
y ~ s(x0) + s(x1) + s(x2) + s(x3)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.82292    0.03419   24.07   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
        edf Ref.df Chi.sq  p-value    
s(x0) 2.432  3.029  5.517    0.141    
s(x1) 1.563  1.928 33.913 3.12e-08 ***
s(x2) 6.317  7.473 99.405  < 2e-16 ***
s(x3) 1.001  1.003  0.240    0.626    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.256   Deviance explained = 24.2%
-REML = 744.98  Scale est. = 1         n = 400

If you add the degrees of freedom used by the parametric terms (i.e., 1 degree of freedom for the intercept) and the degrees of freedom used by the non-parametric terms (i.e., the effective degrees of freedom listed in the edf column), you get:

1 + 2.432 + 1.563 + 6.317 + 1.001 = 12.313

You can double-check that your computation is correct via the command:

sum(influence(m))

which produces the output below:

> sum(influence(m))
[1] 12.31309

The residual degrees of freedom would be computed as the difference between the number of observations included in the model (n) and the model degrees of freedom (mdf): n - mdf. For the present example, n = 400 and mdf = 12.313, so that the residual degrees of freedom would be 400 - 12.313 = 387.687.

Note that, if you were to compare your model against the intercept-only model, the anova function would report slightly different residual degrees of freedom, since it uses a different formula for the computation of model degrees of freedom and this affects the computation of the residual degrees of freedom.

m0 <- gam(y~1,family=poisson,data=dat,method="REML")
m0

m <-gam(y~s(x0)+s(x1)+s(x2)+ s(x3),family=poisson,data=dat,method="REML")
m

anova(m0,m, test="Chisq")

The output of the anova command would look like this:

> anova(m0,m, test="Chisq")
Analysis of Deviance Table

Model 1: y ~ 1
Model 2: y ~ s(x0) + s(x1) + s(x2) + s(x3)
      Resid. Df Resid.   Dev     Df Deviance  Pr(>Chi)    
1    399.00     625.87                              
2    383.45   474.38   15.552   151.48 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The residual degrees of freedom reported by the anova for Model 2 (i.e., m) are equal to 383.45 (rather than 387.687).

See https://web.as.uky.edu/statistics/users/pbreheny/621/F12/notes/11-29.pdf (slide 25/30) for an explanation of the difference in formulas used in the summary() and anova() functions when it comes to the computation of the model degrees of freedom.

Related Question