Solved – Different length in variable and residuals from the model


After having posted another question on the subject, I'm trying this time on my real data (unfortunately not openly publishable) to find the the best variable transformation that yields linearity in log hazard or log cumulative hazard of a Cox proportional Hazard model. For this I'm trying to plot the variable against the residuals by using this code in R

cox_mod_spline = coxph(Surv(timespan_censored,status)~ risk_factor, data = df)
res = residuals(cox_mod_spline, type = "martingale")
plot(na.omit(df$risk_factor), res)

However I get this error message : Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ

Indeed when I enter this code:


I get

[1] 587
[1] 577


I also checked that there are no NA in df$risk_factor, So I'm assuming that it can hardly come from there. Moreover, as it is a univariate (as opposed to multiple) model, it cannot comes from NA in other variable as there will not even be considered in the model

why do the residuals and the variable differ in length given the fact that the residuals of the model are created FROM the variable itself?

Best Answer

Do you perhaps have any missing data in timespan_censored or status? Could you perhaps try the following:

df_new <- with(df, df[complete.cases(timespan_censored, status, risk_factor), ])
cox_mod <- coxph(Surv(timespan_censored, status) ~ risk_factor, data = df_new)

df_new$martn_res <- residuals(cox_mod, type = "martingale")
plot(martn_res ~ risk_factor, data = df_new)
Related Question