According to ESL, natural cubic basic spline with $K$ knots is represented by $K$ basis function.
However, the ns()
function in R with knots=K
generates a basis matrix with $K+2$ basis function. This representation seems to add just two and not four constraints in both the boundary regions.
Indeed the documentation says that that the resulting matrix is a $N \times df$ matrix, where df = length(knots) + 1 + intercept.
(In addition to this, the resulting plot of lm(y~ns(x,knots=c(k_1,k_2,...,K_n),intercept=T))
is not a linear function in the extreme regions)
My question is : is this a different definition of natural cubic spline?
Best Answer
There are no different definitions but unfortunately as S. Wood says: "Note that there are many alternative ways of representing such a cubic spline using basis functions: although all are equivalent, the link to the piecewise cubic characterization is not always transparent." [SW2017]
The definition of the natural cubic spline is as always: "The natural cubic spline, $g(x)$, interpolating (a set of points $\{x_i , y_i: i = 1, \dots, n\}$ where $x_i <x_{i+1}$), is a function made up of sections of cubic polynomial, one for each $[x_i, x_{i+1}]$, which are joined together so that the whole spline is continuous to second derivative, while $g(x_i) = y_i$ and $g′′(x_1) = g′′(x_n) = 0$." (Again from [SW2017])
In addition, and making a specific mention now to the concept of knots: "(Letting) $\xi_1 < \xi_2 < \dots < \xi_k$ be a set of ordered points - called knots - contained in some interval $(a, b)$, a cubic spline is a continuous function $r$ such that: (i) $r$ is a cubic polynomial over ($\xi_1$, $\xi_2$), ($\xi_2$, $\xi_3$), $\dots$. and (ii) $r$ has continuous first and second derivatives at the knots. More generally, an $M$th-order spline is a piecewise $M-1$ degree polynomial with $M-2$ continuous derivatives at the knots. A spline that is linear beyond the boundary knots is called a natural spline." (from [LW2006])
Returning now to
ns
, simply put the naming of the functionns
is confusing. As Phil Karlton, one of the original Netscape project leaders/curmudgeons, said: "There are only two hard things in Computer Science: cache invalidation and naming things.". Here, the naming is probably a bit off because someone thought that the boundary points are not really knots but just points. Therefore, it made sense for knots to be actually only the interior points. This is alluded in the documentation ofns
that comments on the association of the argumentdf
with "the number of inner knots aslength(knots)
". This suggests that actuallyknots
refers to inner knots.For example, both
splines::ns(...)
andmgcv::s( bs='cr', ...)
use the same knot locations. (where by default they are on relevant quantiles of $x$)Finally to clarify your side-question: NCS are constraint in such way that the function is linear beyond the boundary knots, not between a boundary point and the adjacent interior knot.
Keeping with the same example as before:
In general, unless one needs to use the
splines
package to define particular knot locations, etc., I would suggest usingmgcv
for an out-of-the-box analysis that uses splines. It is well-documented and straight-forward to use.[SW2017]: S. Wood, 2017, Generalized Additive Models An Introduction with R, 2nd Ed. Chapt. 5.
[LW2006]: L. Wasserman, 2006, All of Nonparametric Statistics, Chapt. 5.