Solved – Cox regression with controlled variables

cox-modelsurvival

in my retrospective study I have continuous and categorical variables. These latter are divided in two or more sub-groups. I want to analyze their impact on overall survival. In particular, I’m interested in a dichotomous variable (which I’m gonna call myVariable) and I suspect that its impact may be affected by age and sex. First, I categorized the continuous variables in 2 or more sub-categories and I performed a univariate analysis by the log-rank test with kaplan-meyer survival plots; myVariable was not significant.

On multivariate analysis, I used the Cox regression controlling for age and sex: I put age and sex in the first block and the other variables in the second, using the enter method. I included myVariable in the second block, to verify if it became significant after controlling for age and sex. Although I had previously categorized age and other continuous variables, I now used the original uncategorized data, when possible.

1 Is this procedure correct?

2 On Cox regression myVariable became significant. Strangely, age was not significant on log-rank test but became significant on Cox regression. How comes?

3 I’m using SPSS. In the “variable view” I set variables as string or numeric as appropriate. However, each variable is set as “Nominal”. Is that important? In other words, can it affect the results?

Thank you for your advices.
Andrea

Best Answer

I don't understand why you're categorizing your continuous variables. There are two ways a variable can be handled in a Cox model.

With covariate adjustment, you just estimate a hazard ratio comparing units of your variable. If continuous age is included in the model, then the hazard ratio is interpreted as a ratio of hazards comparing individuals differing by one year in age, at risk for disease, holding all other variables constant. This can handle continuous and categorical variables alike, it just depends on how they're coded.

$\lambda(t| \mbox{MyVar}, \mbox{Age}) = \lambda(t| \mbox{MyVar}=0, \mbox{Age}=0)\exp\left(\beta_1\mbox{MyVar} + \beta_2 \mbox{Age} \right)$

With stratification, you need fixed values. This allows a unique baseline hazard to be estimated among the various values of the variable. This is preferred when the proportional hazards assumption is not met for this variable in adjustment. You need considerably more observations to use stratification and it's rarely such an issue that it needs to be considered.

$\lambda(t| \mbox{MyVar}, \mbox{Age}) = \lambda_\mbox{Age}(t| \mbox{MyVar}=0)\exp\left(\beta_1\mbox{MyVar} \right)$

So, to answer your questions:

  1. No I don't think splitting age into arbitrary categories is the right way to go about things.

  2. Parameter estimates don't "become significant", if you compare models that do and don't adjust for certain covariates, the interpretation of the model coefficients change between models, they're not the same coefficients examined under different lights. Avoid this kind of language altogether. If age is a confounding variable, it's not significance of this variable (or the main effect) that warrants our use of it in a multivariable regression model. Instead we adjust for age regardless because it gives us the correct, unbiased, adjusted measure of the relationship between myVariable and failure time.

  3. Yes, it's hugely important. Age is continuous. You have to code it as such in order to adjust it as such.