Solved – How many knots for restricted cubic splines

regressionrmssplines

I have a dataset of approximately 10,000 patients for whom I investigate the association between a specific measurement and disease risk. For the independent variable, I use restricted cubic splines – but I am somewhat uncertain about the appropriate number of knots to use. The literature I found suggests that for large sample sizes (such as my dataset), n=5 would be appropriate – however, I am not convinced by the results (same data analysed with 3, 4 and 5 knots):

![Analysis with different numbers of knots](https://imgur.com/a/MleUb)

Intuitively, I would select 3 knots as there is no obvious advantage in higher numbers – but is this really the case?

Best Answer

Your graphs indeed look (to me) like four or five knots may entail some slight overfitting, and I personally would tend to use three.

If you want a more formal procedure, Frank Harrell in his Regression Modeling Strategies (section 2.4.5) suggests using Akaike's Information Criterion (AIC). Alternatively, you could cross-validate and pick the number yielding the lowest error.

Related Question