First it is not the basis but a basis: We want to build a basis for $K$ knots of natural cubic splines.
According to the constraints, "a natural cubic splines with $K$ knots is represented by $K$ basis functions". A basis is described with the $K$ elements $N_1, \ldots, N_K$. Note that "$d_K$" is never used to define any of those elements.
[This paragraph is explained in details in this answer https://stats.stackexchange.com/q/233286 ]
I dug into the exercise that $N_1, \ldots, N_K$ is a basis for $K$ knots of natural cubic splines. (this is Ex. 5.4 of the book)
The knots $(\xi_k)$ are fixed.
With the truncated power series representation for cubic splines with $K$ interior knots, we have this linear combination of the basis:
$$f(x) = \sum_{j=0}^3 \beta_j x^j + \sum_{k=1}^K \theta_k (x - \xi_k)_{+}^{3}.$$
For now, there are $K+4$ degree of freedom, and we will add constraints to reduce it (we already know we need $K$ elements in the basis finally).
Part I: Conditions on the coefficients
We add the constraint "the function is linear beyond the boundary knots". We want to show the four following equations: $\beta_2 = 0$, $\beta_3 = 0$, $\sum_{k=1}^K \theta_k = 0$ and $\sum_{k=1}^K \theta_k \xi_k = 0$.
Proof:
For $x < \xi_1$,
$$f(x) = \sum_{j=0}^3 \beta_j x^j$$ so
$$f''(x) = 2 \beta_2 + 6 \beta_3 x.$$
The equation $f''(x)=0$ leads to $2 \beta_2 + 6 \beta_3 x = 0$ for all $x < \xi_1$.
So necessarily, $\beta_2 = 0$ and $\beta_3 = 0$.
For $x \geq \xi_K$, we replace $\beta_2$ and $\beta_3$ by $0$ and we obtain:
$$f(x) = \sum_{j=0}^1 \beta_j x^j + \sum_{k=1}^K \theta_k (x- \xi_k)^3$$ so
$$f''(x) = 6 \sum_{k=1}^K \theta_k (x-\xi_k).$$
The equation $f''(x)=0$ leads to $\left( \sum_{k=1}^K \theta_k \right) x - \sum_{k=1}^K \theta_k \xi_k = 0$ for all $x \geq \xi_k$.
So necessarily, $\sum_{k=1}^K \theta_k = 0$ and $\sum_{k=1}^K \theta_k \xi_k = 0$.
Part II: Relation between coefficients
We get a relation between $\theta_{K-1}$ and $\left( \theta_{1}, \ldots, \theta_{K-2} \right)$.
Using equations $\sum_{k=1}^K \theta_k = 0$ and $\sum_{k=1}^K \theta_k \xi_k = 0$ from Part I, we write:
$$0 = \left( \sum_{k=1}^K \theta_k \right) \xi_K - \sum_{k=1}^K \theta_k \xi_k = \sum_{k=1}^K \theta_k \left( \xi_K - \xi_k \right) = \sum_{k=1}^{K-1} \theta_k \left( \xi_K - \xi_k \right).$$
We can isolate $\theta_{K-1}$ to get: $$\theta_{K-1} = - \sum_{k=1}^{K-2} \theta_k \frac{\xi_K - \xi_k}{\xi_K - \xi_{K-1}}.$$
Part III: Basis description
We want to obtain the base as described in the book. We first use: $\beta_2=0$, $\beta_3=0$, $\theta_K = -\sum_{k=1}^{K-1} \theta_k$ from Part I and replace in $f$:
\begin{align*}
f(x) &= \beta_0 + \beta_1 x + \sum_{k=1}^{K-1} \theta_k (x - \xi_k)_{+}^{3} - (x - \xi_K)_{+}^{3} \sum_{k=1}^{K-1} \theta_k \\
&= \beta_0 + \beta_1 x + \sum_{k=1}^{K-1} \theta_k \left( (x - \xi_k)_{+}^{3} - (x - \xi_K)_{+}^{3} \right).
\end{align*}
We have: $(\xi_K - \xi_k) d_k(x) = (x - \xi_k)_{+}^{3} - (x - \xi_K)_{+}^{3}$ so:
$$f(x) = \beta_0 + \beta_1 x + \sum_{k=1}^{K-1} \theta_k (\xi_K - \xi_k) d_k(x).$$
We have removed $3$ degree of freedom ($\theta_K$, $\beta_2$ and $\beta_3$). We will proceed to remove $\theta_{K-1}$.
We want to use equation obtained in Part II, so we write:
$$f(x) = \beta_0 + \beta_1 x + \sum_{k=1}^{K-2} \theta_k (\xi_K - \xi_k) d_k(x) + \theta_{K-1} (\xi_K - \xi_{K-1}) d_{K-1}(x).$$
We replace with the relationship obtained in Part II:
\begin{align*}
f(x) &= \beta_0 + \beta_1 x + \sum_{k=1}^{K-2} \theta_k (\xi_K - \xi_k) d_k(x) - \sum_{k=1}^{K-2} \theta_k \frac{\xi_K - \xi_k}{\xi_K - \xi_{K-1}} (\xi_K - \xi_{K-1}) d_{K-1}(x) \\
&= \beta_0 + \beta_1 x + \sum_{k=1}^{K-2} \theta_k (\xi_K - \xi_k) d_k(x) - \sum_{k=1}^{K-2} \theta_k (\xi_K - \xi_k) d_{K-1}(x) \\
&= \beta_0 + \beta_1 x + \sum_{k=1}^{K-2} \theta_k (\xi_K - \xi_k) (d_k(x) - d_{K-1}(x)).
\end{align*}
By definition of $N_{k+2}(x)$, we deduce:
$$f(x) = \beta_0 + \beta_1 x + \sum_{k=1}^{K-2} \theta_k (\xi_K - \xi_k) N_{k+2}(x).$$
For each $k$, $\xi_K - \xi_k$ does not depend on $x$, so we can let $\theta'_k := \theta_k (\xi_K - \xi_k)$ and rewrite:
$$f(x) = \beta_0 + \beta_1 x + \sum_{k=1}^{K-2} \theta'_k N_{k+2}(x).$$
We let $\theta'_1 := \beta_0$ and $\theta'_2 := \beta_1$ to get:
$$f(x) = \sum_{k=1}^{K} \theta'_k N_{k}(x).$$
The family $(N_k)_k$ has $K$ elements and spans the desired space of dimension $K$.
Furthermore, each element verifies the boundary conditions (small exercise, by taking derivatives).
Conclusion: $(N_k)_k$ is a basis for $K$ knots of natural cubic splines.
Best Answer
How to specify the knots in R
The
ns
function generates a natural regression spline basis given an input vector. The knots can be specified either via a degrees-of-freedom argumentdf
which takes an integer or via a knots argumentknots
which takes a vector giving the desired placement of the knots. Note that in the code you've writtenyou have not requested five knots, but rather have requested a single (interior) knot at location 5.
If you use the
df
argument, then the interior knots will be selected based on quantiles of the vectorx
. For example, if you make the callThen the basis will include two boundary knots and 4 internal knots, placed at the 20th, 40th, 60th, and 80th quantiles of
x
, respectively. The boundary knots, by default, are placed at the min and max ofx
.Here is an example to specify the locations of the knots
If you were to instead call
ns(x, df=4)
, you would end up with 3 internal knots at locations 25, 50, and 75, respectively.You can also specify whether you want an intercept term. Normally this isn't specified since
ns
is most often used in conjunction withlm
, which includes an intercept implicitly (unless forced not to). If you useintercept=TRUE
in your call tons
, make sure you know why you're doing so, since if you do this and then calllm
naively, the design matrix will end up being rank deficient.Strategies for placing knots
Knots are most commonly placed at quantiles, like the default behavior of
ns
. The intuition is that if you have lots of data clustered close together, then you might want more knots there to model any potential nonlinearities in that region. But, that doesn't mean this is either (a) the only choice or (b) the best choice.Other choices can obviously be made and are domain-specific. Looking at histograms and density estimates of your predictors may provide clues as to where knots are needed, unless there is some "canonical" choice given your data.
In terms of interpreting regressions, I would note that, while you can certainly "play around" with knot placement, you should realize that you incur a model-selection penalty for this that you should be careful to evaluate and should adjust any inferences as a result.