Solved – How to predict new data with spline/smooth regression

machine learningpredictive-modelsregressionsplines

Can anyone help give a conceptual explanation to how predictions are made for new data when using smooths /splines for a predictive model? For example, given a model created using gamboost in the mboost package in R, with p-splines, how are predictions for new data made? What is used from the training data?

Say that there is a new value of the independent variable x and we want to predict y. Is a formula for spline creation applied to this new data value using the knots or df used when training the model and then the coefficients from the trained model are applied to output the prediction?

Here is an example with R, what is predict doing conceptually to output 899.4139 for the new data mean_radius = 15.99?

#take the data wpbc as example
library(mboost)
data(wpbc)

modNew<-gamboost(mean_area~mean_radius, data = wpbc, baselearner = "bbs", dfbase = 4, family=Gaussian(),control = boost_control(mstop = 5))
test<-data.frame(mean_radius=15.99)
predict(modNew,test)

Best Answer

The way the prediction is computed is like this:

From the original fit, you have knot locations spread through the range of mean_radius in your training data. Together with the degree of the B-spline basis (cubic by default in mboost), these knot locations define the shape of your B-spline basis functions. The default in mboost is to have 20 interior knots, which define 24 cubic B-spline basis functions (don't ask...). Lets call these basis functions $B_j(x); j=1,\dots,24$. The effect of your covariate $x=$``mean_radius`` is represented simply as $$ f(x) = \sum^{24}_j B_j(x) \theta_j $$ This is a very neat trick, because it reduces the hard problem of estimating the unspecified function $f(x)$ to the much simpler problem of estimating linear regression weights $\theta_j$ associated with a collection of synthetic covariates $B_j(x)$.

Prediction then is not that complicated: Given the estimated coefficients $\hat \theta_j$, we need to evaluate the $B_j(\cdot);\; j=1,\dots,24$ for the prediction data $x_{new}$. For that, all we need are the knot locations that define the basis functions for the original data. We then get the predicted values as $$ \hat f(x_{new}) = \sum^{24}_j B_j(x_{new}) \hat\theta_j. $$

Since boosting is an iterative procedure, the estimated coefficients at the stop iteration $m_{stop}$ are actually the sum of the coefficient updates in iterations $1, \dots, m_{stop}$. If you really want to get a grip on the details, take a look at the output you get from

bbs(rnorm(100))$dpp(rep(1,100))$predict,

and go explore from there. For example,

with(environment(bbs(rnorm(100))$dpp(rep(1,100))$predict), newX)

calls

with(environment(bbs(rnorm(100))$dpp(rep(1,100))$predict), Xfun)

to evaluate the $B_j(\cdot)$ on $x_{new}$.

Related Question