Solved – Odds ratio of a continuous variable (univariate cox proportional hazards), how to plot that variable against death%

continuous datacox-modeldata visualization

I have a continuous variable X with which i ran Cox proportional hazards. The outcome was 1=Death, 0=censored/still alive. I have an odds ratio 0.516 for this predictor.

1) How do i interpret that hazard ratio? For every unit increase in X there is .516 more times chance of dying or living?

2) How can I plot a graph of the survival function (or death rate), with Variable X in the x-axis? (please state specific procedures/software that can do that).

EDIT: I found out S(t) = S₀(t)^exp(γ) ,
γ = -0.441χ .

χ = the continuous variable value (a biological marker, so χ=0 makes no sense).

The only problem is how to estimate the S₀. Which χ shall I use? the mean? the mode? the median?

Thank you.

Best Answer

When you run a Cox proportional hazards analysis you are estimating the hazard ratio for covariates without actually estimating the baseline survival function. When you come to actually estimate the survival function for a specific value of covariates you do need to then estimate the survival function and if you have a good amount of data (i.e., lots of people at the start and lots of them followed up to death) then the Kaplan-Meier survival function adjusted to covariates at value 0 is usually the best baseline.

This raises two questions:

How does one adjust the Kaplan-Meier survival function to covariates value 0?
What if the value 0 is not meaningful for a covariate?

For the first point I could point you to http://www.stata.com/manuals13/st.pdf pages 195-196 or http://data.princeton.edu/pop509/NonParametricSurvival.pdf . The danger with using the baseline cumulative hazard function as you suggest is that for discrete failure time (i.e., Kaplan-Meier) it does not hold that $S_0(t)=\exp\{-H_0(t)\}$ (although they are close if there are not many failures/deaths).

I see from the URL you posted in your comment that medcalc is able to produce an estimate of the survival function for mean covariates - you can use this in the following manner:

\begin{align} S_X(t) &= S_{\bar{X}}(t)^{\exp\{Y\}} \\ Y &= -0.441 (X_1-\bar{X}_1) \end{align}

Where $S_{\bar{X}}(t)$ is the survival function for mean covariates, $X_1$ is the covariate you will vary on the x-axis, and $\bar{X}_1$ is the mean value of that covariate. You do not need to do any adjustment for the other covariates because they are already assumed to take their mean value.

For the second point it is not uncommon for a value of 0 to be not meaningful (e.g., the covariate could be weight or age of adults). If a linear relationship between the covariate and hazard is established then it will not matter if you estimate the baseline with an unmeaningful value (provided you don't present it as representative of anything). You may consider running some proportional hazards tests on your covariate to see whether a linear relationship is reasonable. If it is not then you could consider instead replacing $X$ with $\ln X$ in your regression (which will lead to a baseline estimation at $\ln X = 0 \Leftrightarrow X = 1$ and will ensure $X>0$).

Hope that helps!

Related Solutions

Survival Analysis – Why Use Semi-Parametric Models (Cox Proportional Hazards) Instead of Fully Parametric Models?

If you know the parametric distribution that your data follows then using a maximum likelihood approach and the distribution makes sense. The real advantage of Cox Proportional Hazards regression is that you can still fit survival models without knowing (or assuming) the distribution. You give an example using the normal distribution, but most survival times (and other types of data that Cox PH regression is used for) do not come close to following a normal distribution. Some may follow a log-normal, or a Weibull, or other parametric distribution, and if you are willing to make that assumption then the maximum likelihood parametric approach is great. But in many real world cases we do not know what the appropriate distribution is (or even a close enough approximation). With censoring and covariates we cannot do a simple histogram and say "that looks like a ... distribution to me". So it is very useful to have a technique that works well without needing a specific distribution.

Why use the hazard instead of the distribution function? Consider the following statement: "People in group A are twice as likely to die at age 80 as people in group B". Now that could be true because people in group B tend to live longer than those in group A, or it could be because people in group B tend to live shorter lives and most of them are dead long before age 80, giving a very small probability of them dying at 80 while enough people in group A live to 80 that a fair number of them will die at that age giving a much higher probability of death at that age. So the same statement could mean being in group A is better or worse than being in group B. What makes more sense is to say, of those people (in each group) that lived to 80, what proportion will die before they turn 81. That is the hazard (and the hazard is a function of the distribution function/survival function/etc.). The hazard is easier to work with in the semi-parametric model and can then give you information about the distribution.

Solved – Violation of proportional hazard for covariate but not for interaction it’s part of in a Cox Proportional Hazards model

In your model you need to add an interaction term for the infected:

cph(formula = Surv(start_time, end_time, event) ~ feed_time + 
    treatment * clutch1 + treatment:start_time + 
    cluster(cage), data = df, x = T, y = T)

See my own answer here and my blog about this here. Since you have a limited amount of data you can also use the tt() approach although I'm uncertain if it works as expected with the rms::cph wrapper.

coxph(formula = Surv(lifespan) ~ feed_time + 
      treatment * clutch1 + tt(treatment) + 
      cluster(cage), data = df, x = T, y = T,
      tt = function(x, t, ...){
        ns(x + t, 2)
      })

If you stratify on your main variable you won't get an estimate and you can't do an interaction variable with the clutch1 variable. I may have misread your question but just to be sure, stratification can only be used with categorical variables and not continuous. You can categorize continuous variables but I wouldn't recommend that.

Best Answer

Related Solutions

Survival Analysis – Why Use Semi-Parametric Models (Cox Proportional Hazards) Instead of Fully Parametric Models?

Solved – Violation of proportional hazard for covariate but not for interaction it’s part of in a Cox Proportional Hazards model

Related Question