Solved – Continuous vs Categorical covariate of interest in Cox Regression

cox-modelkaplan-meierregressionsassurvival

I will use aliases throughout to explain my results in brief and a few questions that have propped up in the process.

Suppose I'm interested in the association of baseline measurement of blood pressure and incident heart attack. Using PROC LIFETEST (SAS) I construct K-M curves to present disease-free survival; suppose I perform a median split on blood pressure – a higher blood pressure is associated with a shorter median survival time (as expected). Log-rank test significant.

Have I discretized too early? Also, let's say log-rank test remains significant when I treat blood pressure in quintiles…what's stopping me from showing this KM curve versus the former?

Now I want to quantify this further so I turn to PROC PHREG and construct cox proportional hazard models. Assumptions are not violated in this example. In my model making process, I have a priori reason to include covariates (age, BMI, sex, etc.)

My question pertains to this step in particular. In the K-M curves I chose to categorize/discretize blood pressure (KM of course cannot "take" continuous variables), but in the Cox regression I used blood pressure as a continuous variable. The hazard ratio was significant and greater than 1 (e.g. lower blood pressure has a "protective effect"). Couldn't I also treat blood pressure as a category here as well (low blood pressure (reference), medium blood pressure, high blood pressure). What is the difference in these two approaches? I assume this should give me the same conclusion but I'd like more insight. I know I might "lose information" doing this, but this will also give some clinical interpretation right?

Departing from this example using aliases, my variable of interest is exploratory in nature – it serves as a crude surrogate for a biological phenomenon. So in actuality, just knowing the hazard ratio of the variable in its continuous form should be enough…

Lastly, is there reason to incoporate restricted cubic splines in this
multivariable Cox proportional hazards regression? What exactly does this show – I can't wrap my head around it. Thank you in advance for any guidance.

Best Answer

Firstly, it's important to be carful with interpretation here. Blood pressure may well not have a protective effect, because disease severity in heart failure is known to be associated with lower blood pressure. I.e. blood pressure may or may not have a causal effect on outcomes, but is a surrogate of disease severity, which definitely has an impact on outcomes.

Secondly, using blood pressure as a covariate in Cox regression by just writing model time*status(0)= bp; assumes that the association of blood to the log-hazard is linear. The truth is almost certainly more complicated, but the question is how much it deviates. If the deviation is too much, then just knowing the hazard ratio for BP here is deeply misleading. Splines are a good way of looking at the relationship (and you can plot the results nicely).

Forming lots of categories is an alternative that can (given enough data and categories) approximate arbitrary functional relationships, too (and as you point out works for creating KM plots and as a class/factor variable in Cox regression), but tends to use a lot more parameters. That often makes a spline approach preferable. I'm not sure in what sense categories give more "clinical interpretation", but I guess they lend themselves to conveniently simplistic messages, which can sometimes help with communication - just be careful that you are not misrepresenting the underlying functional relationship by picking poor categories.

Related Question