Can anyone help give a conceptual explanation to how predictions are made for new data when using smooths /splines for a predictive model? For example, given a model created using gamboost
in the mboost
package in R, with p-splines, how are predictions for new data made? What is used from the training data?
Say that there is a new value of the independent variable x and we want to predict y. Is a formula for spline creation applied to this new data value using the knots or df used when training the model and then the coefficients from the trained model are applied to output the prediction?
Here is an example with R, what is predict doing conceptually to output 899.4139 for the new data mean_radius = 15.99?
#take the data wpbc as example
library(mboost)
data(wpbc)
modNew<-gamboost(mean_area~mean_radius, data = wpbc, baselearner = "bbs", dfbase = 4, family=Gaussian(),control = boost_control(mstop = 5))
test<-data.frame(mean_radius=15.99)
predict(modNew,test)
Best Answer
The way the prediction is computed is like this:
From the original fit, you have knot locations spread through the range of
mean_radius
in your training data. Together with the degree of the B-spline basis (cubic by default inmboost
), these knot locations define the shape of your B-spline basis functions. The default inmboost
is to have 20 interior knots, which define 24 cubic B-spline basis functions (don't ask...). Lets call these basis functions $B_j(x); j=1,\dots,24$. The effect of your covariate $x=$``mean_radius`` is represented simply as $$ f(x) = \sum^{24}_j B_j(x) \theta_j $$ This is a very neat trick, because it reduces the hard problem of estimating the unspecified function $f(x)$ to the much simpler problem of estimating linear regression weights $\theta_j$ associated with a collection of synthetic covariates $B_j(x)$.Prediction then is not that complicated: Given the estimated coefficients $\hat \theta_j$, we need to evaluate the $B_j(\cdot);\; j=1,\dots,24$ for the prediction data $x_{new}$. For that, all we need are the knot locations that define the basis functions for the original data. We then get the predicted values as $$ \hat f(x_{new}) = \sum^{24}_j B_j(x_{new}) \hat\theta_j. $$
Since boosting is an iterative procedure, the estimated coefficients at the stop iteration $m_{stop}$ are actually the sum of the coefficient updates in iterations $1, \dots, m_{stop}$. If you really want to get a grip on the details, take a look at the output you get from
bbs(rnorm(100))$dpp(rep(1,100))$predict
,and go explore from there. For example,
with(environment(bbs(rnorm(100))$dpp(rep(1,100))$predict), newX)
calls
with(environment(bbs(rnorm(100))$dpp(rep(1,100))$predict), Xfun)
to evaluate the $B_j(\cdot)$ on $x_{new}$.