Solved – Cox regression with controlled variables

cox-modelsurvival

in my retrospective study I have continuous and categorical variables. These latter are divided in two or more sub-groups. I want to analyze their impact on overall survival. In particular, I’m interested in a dichotomous variable (which I’m gonna call myVariable) and I suspect that its impact may be affected by age and sex. First, I categorized the continuous variables in 2 or more sub-categories and I performed a univariate analysis by the log-rank test with kaplan-meyer survival plots; myVariable was not significant.

On multivariate analysis, I used the Cox regression controlling for age and sex: I put age and sex in the first block and the other variables in the second, using the enter method. I included myVariable in the second block, to verify if it became significant after controlling for age and sex. Although I had previously categorized age and other continuous variables, I now used the original uncategorized data, when possible.

1 Is this procedure correct?

2 On Cox regression myVariable became significant. Strangely, age was not significant on log-rank test but became significant on Cox regression. How comes?

3 I’m using SPSS. In the “variable view” I set variables as string or numeric as appropriate. However, each variable is set as “Nominal”. Is that important? In other words, can it affect the results?

Thank you for your advices.
Andrea

Best Answer

I don't understand why you're categorizing your continuous variables. There are two ways a variable can be handled in a Cox model.

With covariate adjustment, you just estimate a hazard ratio comparing units of your variable. If continuous age is included in the model, then the hazard ratio is interpreted as a ratio of hazards comparing individuals differing by one year in age, at risk for disease, holding all other variables constant. This can handle continuous and categorical variables alike, it just depends on how they're coded.

$\lambda(t| \mbox{MyVar}, \mbox{Age}) = \lambda(t| \mbox{MyVar}=0, \mbox{Age}=0)\exp\left(\beta_1\mbox{MyVar} + \beta_2 \mbox{Age} \right)$

With stratification, you need fixed values. This allows a unique baseline hazard to be estimated among the various values of the variable. This is preferred when the proportional hazards assumption is not met for this variable in adjustment. You need considerably more observations to use stratification and it's rarely such an issue that it needs to be considered.

$\lambda(t| \mbox{MyVar}, \mbox{Age}) = \lambda_\mbox{Age}(t| \mbox{MyVar}=0)\exp\left(\beta_1\mbox{MyVar} \right)$

So, to answer your questions:

No I don't think splitting age into arbitrary categories is the right way to go about things.
Parameter estimates don't "become significant", if you compare models that do and don't adjust for certain covariates, the interpretation of the model coefficients change between models, they're not the same coefficients examined under different lights. Avoid this kind of language altogether. If age is a confounding variable, it's not significance of this variable (or the main effect) that warrants our use of it in a multivariable regression model. Instead we adjust for age regardless because it gives us the correct, unbiased, adjusted measure of the relationship between myVariable and failure time.
Yes, it's hugely important. Age is continuous. You have to code it as such in order to adjust it as such.

Related Solutions

Cox Model – Understanding the Cox Proportional Hazards Model

One issue here is your choice of reference level for the Group variable, G1. The regression coefficients for other Groups are with respect to that reference level, and as I understand it the same is true for the "significant" non-proportionalities seen for the other Groups. Note that this type of summary does not provide a test for the significance of the Group variable as a whole. Had you chosen a different reference level, much of the difficulty with non-proportionality might have been isolated to just one or two Groups. It's important to think about the subject-matter content of your data; there might be good reasons why some groups have different hazard time courses than others.

Also, be careful about how you interpret the p-values for the cox.zph tests. A low p-value for a coefficient is evidence that the proportional hazards assumption doesn't hold for its associated predictor, but a "non-significant" p-value is not proof that the proportional hazards assumption is met. As with any statistical test, a non-significant p-value might simply mean two few cases or too much variability to argue against the null hypothesis of a proportional hazard. It's hard to tell from your graph, but that might explain why crossing plots have p-values that do not rule out the PH assumption.

Solved – compare non-nested Cox models

You can compare Cox regression models (coxph) in R with plrtest which is partial likelihood ratio test for non-nested coxph models:

require("survival")
require("nonnestcox") #github.com/thomashielscher/nonnestcox
pbc  <- subset(pbc, !is.na(trt))
mod1 <- coxph(Surv(time, status==2) ~ age, data=pbc, x=T)
mod2 <- coxph(Surv(time, status==2) ~ age + albumin + bili + edema + protime, data=pbc,  x=T)
mod3 <- coxph(Surv(time, status==2) ~ age + log(albumin) + log(bili) + edema + 
log(protime), data=pbc, x=T)
plrtest(mod3, mod2, nested=F) # non-nested models
plrtest(mod3, mod1, nested=T) # nested models

Best Answer

Related Solutions

Cox Model – Understanding the Cox Proportional Hazards Model

Solved – compare non-nested Cox models

Related Question