Q1
No, leave it in as an s()
term; part of the wiggliness in range
is taken up in the ti(day, range)
term. In fact it would seem sensible to decide whether ti(day, range)
is needed and if it is, refit the model with te(day, range)
rather than the separate s()
terms plus the ti()
term, if only for the reduction in the number of smoothing parameters that need to be estimated (from 8 with your model to 4 for the te()
version - which are doubled from the 4 or 2 parameters I think because of the shrinkage thin-plate splines you are using).
Q2
Values like this are common when you turn on shrinkage (either via shrinkage splines or via the double-penalty approach using select = TRUE
. Both options allow shrinkage of the null space of the penalty matrix. This null space include the set of basis functions that are completely smooth. Without the shrinkage, the smoothness parameters would shrink the wiggly parts of the model until all that was left was the linear function. In that case the range space is shrunk by the smoothness parameter but not the null space.
By adding an extra penalty, in your case by adding a small value to the zero eigenvalues of the penalty matrix, the null space can now also be shrunk via a second penalty and associated smoothness parameter. This allows for the linear part of the spline to be shrunk a little (or a lot) towards the model intercept. If you're familiar with the LASSO penalty you will notice a similarity; the shrunken linear function doesn't have it's full least-squares estimated effect but a somewhat smaller effect because of the shrinkage towards zero. The penalty is different in splines but the effect is similar. The less-than-1 EDF for the term just reflects this.
You can often find spline terms with EDF <1 that are not linear. In that case it seems that the null space (the linear, perfectly smooth bits) has been shrunk as well as the range space (the wiggly bits) but the smoothness parameter(s) for the wiggly bit allow some small amount of wiggliness and the shrinkage to the null space drags the EDF below 1. This is fine; the model is estimating a slightly wiggly function but uncertainty in that estimate probably means that a linear function will reside in the 95% confidence interval for the smooth.
Degrees of freedom are non-integer in a number of contexts. Indeed in a few circumstances you can establish that the degrees of freedom to fit the data for some particular models must be between some value $k$ and $k+1$.
We usually think of degrees of freedom as the number of free parameters, but there are situations where the parameters are not completely free and they can then be difficult to count. This can happen when smoothing / regularizing, for example.
The cases of locally weighted regression / kernel methods an smoothing splines are examples of such a situation -- a total number of free parameters is not something you can readily count by adding up predictors, so a more general idea of degrees of freedom is needed.
In Generalized Additive Models on which gam
is partly based, Hastie and Tibshirani (1990) [1] (and indeed in numerous other references) for some models where we can write $\hat y = Ay$, the degrees of freedom is sometimes taken to be $\operatorname{tr}(A)$ (they also discuss $\operatorname{tr}(AA^T)$ or $\operatorname{tr}(2A-AA^T)$). The first is consistent with the more usual approach where both work (e.g. in regression, where in normal situations $\operatorname{tr}(A)$ will be the column dimension of $X$), but when $A$ is symmetric and idempotent, all three of those formulas are the same.
[I don't have this reference handy to check enough of the details; an alternative by the same authors (plus Friedman) that's easy to get hold of is Elements of Statistical Learning [2]; see for example equation 5.16, which defines the effective degrees of freedom of a smoothing spline as $\operatorname{tr}(A)$ (in my notation)]
More generally still, Ye (1998) [3] defined generalized degrees of freedom as $\sum_i \frac{\partial \hat y_i}{\partial y_i}$, which is the sum of the sensitivities of fitted values to their corresponding observations. In turn, this is consistent with $\operatorname{tr}(A)$ where that definition works. To use Ye's definition you need only be able to compute $\hat y$ and to perturb the data by some small amount (in order to compute $\frac{\partial \hat y_i}{\partial y_i}$ numerically). This makes it very broadly applicable.
For models like those fitted by gam
, those various measures are generally not integer.
(I highly recommend reading these references' discussion on this issue, though the story can get rather more complicated in some situations. See, for example [4])
[1] Hastie, T. and Tibshirani, R. (1990),
Generalized Additive Models
London: Chapman and Hall.
[2] Hastie, T., Tibshirani, R. and Friedman, J. (2009),
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2ndEd
Springer-Verlag.
https://statweb.stanford.edu/~tibs/ElemStatLearn/
[3] Ye, J. (1998),
"On Measuring and Correcting the Effects of Data Mining and Model Selection"
Journal of the American Statistical Association, Vol. 93, No. 441, pp 120-131
[4] Janson, L., Fithian, W., and Hastie, T. (2013),
"Effective Degrees of Freedom: A Flawed Metaphor"
https://arxiv.org/abs/1312.7851
Best Answer
The correct terminology for the degrees of freedom that you need to compute is model degrees of freedom. You could also compute residual degrees of freedom.
The model degrees of freedom are indeed calculated by adding up the degrees of freedom used by the parametric and non-parametric (or smooth) terms in your model.
Here is an example of gam model for which you can check the computation of model degrees of freedom:
The output produced by this model is as follows:
If you add the degrees of freedom used by the parametric terms (i.e., 1 degree of freedom for the intercept) and the degrees of freedom used by the non-parametric terms (i.e., the effective degrees of freedom listed in the edf column), you get:
You can double-check that your computation is correct via the command:
which produces the output below:
The residual degrees of freedom would be computed as the difference between the number of observations included in the model (n) and the model degrees of freedom (mdf): n - mdf. For the present example, n = 400 and mdf = 12.313, so that the residual degrees of freedom would be 400 - 12.313 = 387.687.
Note that, if you were to compare your model against the intercept-only model, the anova function would report slightly different residual degrees of freedom, since it uses a different formula for the computation of model degrees of freedom and this affects the computation of the residual degrees of freedom.
The output of the anova command would look like this:
The residual degrees of freedom reported by the anova for Model 2 (i.e., m) are equal to 383.45 (rather than 387.687).
See https://web.as.uky.edu/statistics/users/pbreheny/621/F12/notes/11-29.pdf (slide 25/30) for an explanation of the difference in formulas used in the summary() and anova() functions when it comes to the computation of the model degrees of freedom.