Solved – Can degrees of freedom be a non-integer number

degrees of freedomgeneralized-additive-modelr

When I use GAM, it gives me residual DF is $26.6$ (last line in the code). What does that mean? Going beyond GAM example, In general, can the number of degrees of freedom be a non-integer number?

> library(gam)
> summary(gam(mpg~lo(wt),data=mtcars))

Call: gam(formula = mpg ~ lo(wt), data = mtcars)
Deviance Residuals:
    Min      1Q  Median      3Q     Max 
-4.1470 -1.6217 -0.8971  1.2445  6.0516 

(Dispersion Parameter for gaussian family taken to be 6.6717)

    Null Deviance: 1126.047 on 31 degrees of freedom
Residual Deviance: 177.4662 on 26.6 degrees of freedom
AIC: 158.4294 

Number of Local Scoring Iterations: 2 

Anova for Parametric Effects
            Df Sum Sq Mean Sq F value    Pr(>F)    
lo(wt)     1.0 847.73  847.73  127.06 1.239e-11 ***
Residuals 26.6 177.47    6.67                      

Best Answer

Degrees of freedom are non-integer in a number of contexts. Indeed in a few circumstances you can establish that the degrees of freedom to fit the data for some particular models must be between some value $k$ and $k+1$.

We usually think of degrees of freedom as the number of free parameters, but there are situations where the parameters are not completely free and they can then be difficult to count. This can happen when smoothing / regularizing, for example.

The cases of locally weighted regression / kernel methods an smoothing splines are examples of such a situation -- a total number of free parameters is not something you can readily count by adding up predictors, so a more general idea of degrees of freedom is needed.

In Generalized Additive Models on which gam is partly based, Hastie and Tibshirani (1990) [1] (and indeed in numerous other references) for some models where we can write $\hat y = Ay$, the degrees of freedom is sometimes taken to be $\operatorname{tr}(A)$ (they also discuss $\operatorname{tr}(AA^T)$ or $\operatorname{tr}(2A-AA^T)$). The first is consistent with the more usual approach where both work (e.g. in regression, where in normal situations $\operatorname{tr}(A)$ will be the column dimension of $X$), but when $A$ is symmetric and idempotent, all three of those formulas are the same.

[I don't have this reference handy to check enough of the details; an alternative by the same authors (plus Friedman) that's easy to get hold of is Elements of Statistical Learning [2]; see for example equation 5.16, which defines the effective degrees of freedom of a smoothing spline as $\operatorname{tr}(A)$ (in my notation)]

More generally still, Ye (1998) [3] defined generalized degrees of freedom as $\sum_i \frac{\partial \hat y_i}{\partial y_i}$, which is the sum of the sensitivities of fitted values to their corresponding observations. In turn, this is consistent with $\operatorname{tr}(A)$ where that definition works. To use Ye's definition you need only be able to compute $\hat y$ and to perturb the data by some small amount (in order to compute $\frac{\partial \hat y_i}{\partial y_i}$ numerically). This makes it very broadly applicable.

For models like those fitted by gam, those various measures are generally not integer.

(I highly recommend reading these references' discussion on this issue, though the story can get rather more complicated in some situations. See, for example [4])

[1] Hastie, T. and Tibshirani, R. (1990),
Generalized Additive Models
London: Chapman and Hall.

[2] Hastie, T., Tibshirani, R. and Friedman, J. (2009),
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2ndEd
Springer-Verlag.
https://statweb.stanford.edu/~tibs/ElemStatLearn/

[3] Ye, J. (1998),
"On Measuring and Correcting the Effects of Data Mining and Model Selection"
Journal of the American Statistical Association, Vol. 93, No. 441, pp 120-131

[4] Janson, L., Fithian, W., and Hastie, T. (2013),
"Effective Degrees of Freedom: A Flawed Metaphor"
https://arxiv.org/abs/1312.7851