I do agree with you that you may need to account for individual respondents errors terms through time particularly if you d not have result for all period for each respondent.
A way to do this is with the BayesX. It allows for spatial effects with splines where you can have time in one dimension and the covariate value in the other. Further, you can add a random effect for each observation. Potentially, have a look at this paper.
Though, I am quite sure you will have to put your model into long format. Further, you will have to add an id
column for the respondent or the random effect.
mgcv uses a thin plate spline basis as the default basis for it's smooth terms. To be honest it likely makes little difference in many applications which of these you choose, though in some situations or with very large data set sizes, other basis types might be used to good effect. Thin plate splines tend to have better RMSE performance than the other three you mention but are more computationally expensive to set up. Unless you have a reason to use the P or B spline bases, use thin plate splines unless you have a lot of data and if you have a lot of data consider the cubic spline option.
k
doesn't set the number of knots, at least not in the default thin plate spline basis. What k
does is to set the dimensionality of the basis expansion; you'll end up with k - 1
basis functions. In mgcv Simon Wood does a trick to reduce the rank of basis dimension. IIRC, in the usual thin plate spline basis there is a knot at each data location, but this is wasteful as once you've set up this large basis you end up using far fewer degrees of freedom in the fitted function. What Simon does is to eigen decompose the matrix of basis functions and choose the eigenvectors of the decomposition corresponding to the k - 1
largest eigenvalues. This has the effect of concentrating the main wiggliness "information" of the full basis in a reduced rank form.
The choice of k
is important and the default is arbitrary and something you want to check (see gam.check()
), but the critical observation is that you want to set k
to be large enough to contain the envisioned dimensionality of the underlying function you are trying to recover from the data. In practice, one tends to fit with a modest k
given the data set size and then use gam.check()
on the resulting model to check if k
was large enough. If it wasn't, increase k
and refit. Rinse and repeat...
You are most likely going to want to fit the model using REML (or ML) smoothness selection via method = "REML"
or method = "ML"
: this treats the model as a mixed effects one with the wiggly parts of the spline bases being treated as special random effects terms. Simon Wood has shown that REML (or ML) selection performs better than GCV, which can undersmooth in situations where the objective function is flat around the optimal smoothness parameter value.
The ridge penalty mentioned by @generic_user is taken care of for you, so you can ignore this part of setting up the model.
Best Answer
The main difference is the choice of knots and penalization. To start, let's focus on cubic splines for
x, y
data.*I find it useful to start from the extreme situation, an interpolating spline that connects all data points with a smooth curve. Then each x-axis value is a knot at which two cubic functions interpolating on either side of the knot agree in their values (and with the corresponding
y
value) and in their first and second derivatives. That can fit the data at hand perfectly but at the cost of a very "wiggly" curve and probable overfitting that won't generalize well to new data.The different approaches can be thought of as different ways to avoid that overfitting.
A cubic smoothing spline might keep the original knot locations along the x-axis while penalizing the roughness of the curve in terms of its integrated squared second derivative. An optimal penalty can be chosen by (restricted) marginal likelihood or generalized cross-validation.
That Wikipedia entry outlines other methods, including regression splines and penalized splines:
In this terminology, the
mgcv
package has a major focus on smoothing/penalization methods, with thin-plate splines as a default, although it can accommodate unpenalized regression splines. (The help pages for that package don't restrict the term "regression splines" to unpenalized methods.)A simple implementation of penalized splines is seen in the
pspline()
function in thesurvival
package, which sets up a small set of evenly spaced symmetric basis functions that span the x-axis data range. The corresponding coefficients are then penalized automatically when fittingcoxph
orsurvreg
survival models.The
ns()
andrcs()
functions implement "natural" cubic regression splines, restricted to be linear beyond the outermost knots. By your choice of the number of knots, you thus specify the number of degrees of freedom to be used in the fit. There is no inherent penalization beyond the choice of numbers of knots.Therneau and Grambsch, in Modeling Survival Data: Extending the Cox Model discuss regression splines in Section 5.4 and smoothing splines in Section 5.5, without massive mathematical detail about spline basis functions that can become confusing. They put one difference between them this way:
That said, their
pspline()
function spaces a limited number of basis functions evenly across the x-axis value range, without regard to the distribution ofx
values. See Section 5.8.3 of their book. One might consider that a different "arbitrary choice."Furthermore, regression splines allow you to pre-specify how many degrees of freedom to devote to the modeled predictor. That's an advantage with Harrell's approach to Regression Modeling Strategies, in which such explicit choices early in modeling are key. With penalization methods, one typically lets the data determine how many degrees of freedom to use up. Regression splines work well in practice with typical data, when you don't need the highly specialized types of fits provided by
mgcv
for things like geospatial data.Here's a particular comparison of
rcs()
andgam()
fitting of data generated from the Runge function; code is below.The
gam()
fit devoted 8.95 degrees of freedom (df) to the penalized fit, plus the intercept for 9.95 df total. The corresponding df used by theols()
fit with anrcs(x,7)
term were 6 forx
and 7 df total.Whichever choice you make, don't just accept default parameter settings blindly; not to choose is to choose, and you should understand the choices you are (even implicitly) making about your spline approximation. For example, the
gam()
fit above used the defaultbs=tp
(thin-plate spline) setting for the smoothing basis, which "do not have ‘knots’ (at least not in any conventional sense): a truncated eigen-decomposition is used to achieve the rank reduction." With this size data set, thercs(x,7)
term led to outer knots at the 0.05 and 0.95 quantiles ofx
values, with 5 interior knots "equally spaced between these on the quantile scale." (Emphasis added.) Read the help pages carefully.This site has many posts on these types of splines. Frank Harrell, author of the
rms
package, often discusses regression splines. Gavin Simpson, author of thegratia
package that helps to visualizemgcv
models, often discusses GAM. Searches here for posts by those reputable sources will provide much for further study.Code for the figure:
*There can be different choices of the basis functions used to construct the cubic splines, but as comments note those choices are somewhat arbitrary and should give similar results in practice. Also, a default spline in the
mgcv
package doesn't involve knots in the usual sense at all, as the answer eventually notes.