Restricted Cubic Spline – Interpretation and Handling Missing Estimates

interpretationrregressionsplines

I am running a RBS with 5 knots and I have 2 difficulties.

  1. I get only 5 estimates: 1 for the intercept, one for the linear term, and 3 for the knots. However, since I have 5 knots I should have 4 estimates and I don't understand why I get only 3.

  2. This is the first time I run the analysis and I am a bit struggling to interpret the results.

This is my script

# Load required libraries
library(rms)
library(ggplot2)

Import <- Data

# Define the knots
knots <- c(0.05, 0.275, 0.5, 0.725, 0.95)

# Perform Restricted Cubic Splines regression
rcs_model <- lm(N_of_cases ~ rcs(Match_time, knots = knots), 
                data = Import)

# Generate predicted values
Import$predicted <- predict(rcs_model)

# With CI
# Generate predicted values and confidence intervals
pred <- predict(rcs_model, newdata = 
                data.frame(Match_time = seq(0, 90, by = 1)), 
                type = "response", se.fit = TRUE)

# Extract lower and upper confidence intervals
lower_ci <- pred$fit - 1.96 * pred$se.fit
upper_ci <- pred$fit + 1.96 * pred$se.fit

# Create a data frame with Match_time, lower_ci, and upper_ci
pred_df <- data.frame(Match_time = seq(0, 90, by = 1), lower_ci, upper_ci)

# Create a plot to visualize the data, regression line, and confidence interval
ggplot() +
  geom_point(data = Import, aes(x = Match_time, y = N_of_cases)) +
  geom_line(data = Import, aes(x = Match_time, y = predicted), 
            color = "blue") +
  geom_ribbon(data = pred_df, aes(x = Match_time, ymin = lower_ci, 
              ymax = upper_ci), alpha = 0.2, fill = "grey") +
  labs(x = "Match Time", y = "Number of Cases") +
  theme_minimal() +
  scale_x_continuous(breaks = c(0, 15, 30, 45, 60, 75, 90))

print(summary(rcs_model))

And these are my results


Call:
lm(formula = N_of_cases ~ rcs(Match_time, knots = knots), data = Import)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.3414 -1.7992  0.0484  1.5023 12.9694 

Coefficients:
                                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                                  5.63759    1.26748   4.448 2.61e-05 ***
rcs(Match_time, knots = knots)Match_time    -0.02615    0.08859  -0.295   0.7686    
rcs(Match_time, knots = knots)Match_time'    0.99727    0.50780   1.964   0.0528 .  
rcs(Match_time, knots = knots)Match_time''  -3.68207    1.57176  -2.343   0.0215 *  
rcs(Match_time, knots = knots)Match_time'''  5.02997    2.10133   2.394   0.0189 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.016 on 85 degrees of freedom
Multiple R-squared:  0.2167,    Adjusted R-squared:  0.1799 
F-statistic:  5.88 on 4 and 85 DF,  p-value: 0.0003162

These are the results for print(rcs_model)

Call:
lm(formula = N_of_cases ~ rcs(Match_time, knots = knots), data = Import)

Coefficients:
                                (Intercept)     rcs(Match_time, knots = knots)Match_time  
                                    5.63759                                     -0.02615  
  rcs(Match_time, knots = knots)Match_time'   rcs(Match_time, knots = knots)Match_time''  
                                    0.99727                                     -3.68207  
rcs(Match_time, knots = knots)Match_time'''  
                                    5.02997 

These are the results for anova(rcs_model)

> anova(rcs_model)
Analysis of Variance Table

Response: N_of_cases
                               Df Sum Sq Mean Sq F value    Pr(>F)    
rcs(Match_time, knots = knots)  4 213.96  53.490  5.8804 0.0003162 ***
Residuals                      85 773.19   9.096                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

These the results when I use ols

> anova(rcs_model)
                Analysis of Variance          Response: N_of_cases 

 Factor     d.f. Partial SS MS        F    P    
 Match_time  4   213.9618   53.490443 5.88 3e-04
  Nonlinear  3   185.1462   61.715404 6.78 4e-04
 REGRESSION  4   213.9618   53.490443 5.88 3e-04
 ERROR      85   773.1938    9.096397   

If I am not wrong, the results say that there is a non-linear relationship between the predictor and the outcome. This relationship is statistically significant in the knots with p < 0.05.
If you could confirm this interpretation that would be great!

Best Answer

This is a restricted cabin spline, with the fit restricted to be linear beyond the outermost knots. You are getting the correct number of coefficients for the rcs() terms, as there are only 4 intervals within the limits of the 5 knots, and with the restriction to linearity beyond the outermost knots the fit is to smoothly-joined cubic splines just within those 4 intervals. The way that rcs() implements the splines, the coefficients are represented as a linear coefficient plus extra terms for the non-linearities. Frank Harrell explains the details in Section 2.4.5 of Regression Modeling Strategies.

When there are multiple coefficients associated with a single predictor, the usual summary() of the model can be misleading as it only shows the "significance" of each of the coefficients individually. That's why I suggested in comments that you use the ols() and anova() functions of the rms package. That assesses the overall association of Match_time with outcome (including all its associated coefficients) in the first line of the anova() output, and evaluates the joint significance of its 3 non-linear terms in the section line.* Yes, in this case you are correct that there is substantial evidence for a non-linearity (p = 4e-04) as you originally inferred from the individual coefficients in the first model summary, but the joint test is more reliable in general.

As Match_time is the only predictor in the model, the report for REGRESSION is the same as for that predictor overall. The MS in the ERROR line is the mean-squared residual error not "explained" by the predictor.


*Note that anova() on the model produced by lm() did not separate out the linear from the non-linear terms. It might do so if you used the ns() implementation of restricted cubic splines from the splines package instead of rcs(), but I find the default knot locations chosen by rcs() to be more useful.

Related Question