Solved – problem with multicollinearity and for splines regression

generalized linear modelmulticollinearitymultiple regressionpredictive-modelssplines

When using natural (i.e. restricted) cubic splines, the basis functions created are highly collinear, and when used in a regression seem to produce very high VIF (variance inflation factor) statistics, signaling multicollinearity. When one is considering the case of a model for prediction purposes, is this an issue? It seems like it will always be the case because of the nature of the spline construction.

Here is an example in R:

library(caret)
library(Hmisc)
library(car)
data(GermanCredit)

spl_mat<-rcspline.eval(GermanCredit$Amount,  nk=5, inclx=TRUE) #natural cubic splines with 5 knots

class<-ifelse(GermanCredit$Class=='Bad',1,0) #binary target variable
dat<-data.frame(cbind(spl_mat,class))

cor(spl_mat)

OUTPUT:
              x                              
    x 1.0000000 0.9386463 0.9270723 0.9109491
      0.9386463 1.0000000 0.9994380 0.9969515
      0.9270723 0.9994380 1.0000000 0.9989905
      0.9109491 0.9969515 0.9989905 1.0000000


mod<-glm(class~.,data=dat,family=binomial()) #model

vif(mod) #massively high

OUTPUT:
x         V2         V3         V4 
319.573 204655.833 415308.187  45042.675

UPDATE:

I reached out to Dr. Harrell, the author of Hmisc package in R (and others) and he responded that as long as the algorithm converges (e.g. the logistic regression) and the standard errors have not exploded (as Maarten said below) – and the model fits well, best shown on a test set, then there is no issue with this collinearity.

Further, he stated (and this is present on page 65 of his excellent Regression Modeling Strategies book) that collinearity between variables constructed in an algebraic fashion like restricted cubic splines is not an issue as multicollinearity only matters when that collinearity changes from sample to sample.

Best Answer

The multicollinearity can lead to numerical problems when estimating such a function. This is why some use B-splines (or variations on that theme) instead of restricted cubic splines. So, I tend to see restricted cubic splines as one potentially usefull tool in a larger toolbox.