A couple suggestions, not directly related to CoxPH but to interactions and collinearity
1) When you are getting "crazy" values like these, one possiblitiy is collinearity. This is often a problem when you have interactions. Have you centered all your variables (by subtracting the mean from each)?
2) You can't interpret one interaction among many quite so easily. LT, food and temp2 are all involved in many interactions. So, look at predicted values from different combinations.
3) Check the units of the different variables. When you get crazy parameters, sometimes it's a problem of units (e.g. measuring a human height in millimeters or kilometers)
4) Once you've got that stuff straightened out, I find the easiest way to think of the effects of different interactions (esp. higher level ones) is to graph the predicted values with different combinations of the independent values.
I'll give a try to answer this, but keep in mind that I do not have a real experience with the PWP model and if anybody has a better input that would be welcome.
General observations
- I have a problem with treating discrete covariates as continuous, in general. In my opinion, there is no sensible interpretation when this is done.
- You should not use unstandardized continuous covariates. For the combination X=10 and D=10 for example, the estimated coefficient will most likely be very small (because in the expressions of the Cox model they appear in the exponential, yielding very high values).
- When using strata for every event, you should have a lot of data for all combinations of groups for all strata, in order to have power to reject hypothesis concerning covariate effects.
- Of more interest here would be regression coefficients rather than hazard ratios. Assume the final model estimates $\beta_X$, $\beta_D$ and $\beta_{XD}$ as regression coefficients. In this case the overall effect of $X$ given $D=d$ can be calculated as $(\beta_X + \beta_{XD} d)$ (of course this should be put more formal if you have several groups, with dummy variables, etc).
- I would not use "increasing" hazard rate, when you refer to an "increased" hazard rate due to a covariate. The former appeals more to the shape of the hazard function, which is not of a concern in this case.
- The interpretation of covariate effects in survival analysis with proportional hazards is not that different from regular regression analysis, just that the covariate effects $\beta$ have the interpretation of log-hazard-ratio. In the PWP model, this is also conditional on being at risk for the $k$-th event.
Other general things
Another thing would be that in the model selection you should also factor in what would make sense and what would be a useful model for your research question. I think it's generally bad practice to fit a lot of models without knowing beforehand what question that model answers.
A puzzling quote is
In general, I belive there is a need (and interest in) for some clarification about interaction effects – a quite complicated area for all quantitative methods–focused student/professionals.
This is why statistics textbooks exist. Any decent book on regression models should explain interaction effects. For example, I used the Fox book (but I assume there are plenty out there).
As a final recommendation, it would be instructive to write down the hazards expressions and their estimates for all the groups and the combination of groups, with pen and paper. This I think would clear up many of the confusions that you encounter in interpreting these effects.
Keeping all these in mind, I'll give some comments on the scenarios that you mentioned.
Scenario 1
Suppose now that in model with only X+D (with no interaction term), my main variable X was significant. It is not significant in the interaction model (see above result).
My intuition tells me that this should not happen too often, i.e. removing something not significant should not alter the other estimates a lot. This might happen though because you lose power when adding the interaction effect (you lose "degrees of freedom").
My interpretation 1) I simply state that there were no interaction effects between X and D. However, while the D variable is significant (with increasing hazard rate) the X is not. Thus, my main explanatory variable is not sufficient to explain this. Alternatively, 2) I state that there were no interaction effects, and the coef. of X in the interaction model does not make any sense or is hard to interpret. I don't even show this results, but put it on a note.
Keep in mind observations 1, 2 and 5. There might be an interaction effect, but you just don't have enough power to detect it. The coefficient of the main effect of $X$ does make (some) sense: it is the log-hazard ratio for a subject with $D=0$.
Scenario 2
Again keep in mind observations 1 and 2.
My interpretation: "X:D" is decreasing, i.e. when D=0 and X increasing, the hazard for experiencing the event is decreasing(weak), but the effect is not significant. When "D" is = 1, the hazard is increasing.
The total effect of $X$ is $\log(1.0677) + d \times \log(0.9994) = 0.0655 - d \times 0.0006$. So a larger value of $X$ leads to an increased hazard (ratio), regardless of $D$. The total effect of $X$ is slightly smaller when $D=1$.
Scenario 3
Here there are 3 interaction terms. It is instructive to compute again the total effect of $X$, conditional on the values of $D$. It looks like for $D\in\left\{1,2\right\}$ the effect of $X$ is attenuated as compared to when $D=0$, and for $D=3$ the effect is amplified, as compared to when $D=0$. The interactions are not significant, which means that you do not have enough power to reject the hypothesis of interaction in this data set.
Scenario 4
My interpretation: The interaction term is not significant, as in all Scenarios. But the interpretation would be that when X is = 1, the D = 1 and D = 2 are decreasing (compared to D=0) but when X=1 and D=3, the hazard is increasing.
If I read this in a paper I would be hopelessly confused. What is decreasing? The $D$? (I have a feeling I know what you refer to, but you should try to express things less informal).
Scenario 5
My interpretation: The interaction with time does correct for the violation of the assumption: X is decreasing with years. However, X alone is increasing. What is going on here? It doesn't make any sense to me. Unless, the X = 0 (alone), and X = 1 with * stop in the model. If so, the interpretation is then that X = 1 * stop is decreasing over time, while when X = 0, the hazard rate increases with 1.58.
Again, I don't understand your interpretation. What does "$X$ is decreasing with years" mean? Is that the value of $X$? Is it the effect on the hazard ratio? At first glance, it seems to me that the $X=1$ group has a higher hazard rate than the $X=0$ group, at time $0$. As time goes by, this difference becomes smaller.
Best Answer
First, there is no "risk equation" for a Cox model, although a non-parametric baseline hazard can be estimated. If you want an equation you need to specify a parametric model.
Second, asking for a baseline hazard with
centered = TRUE
is not useful when there are categorical predictors, and that's particularly true when there are interactions. The caution that you see after trying to callbasehaz()
is quite explicit and is seen even without time strata. Here's a simple example:So a baseline hazard with
centered = T
, if it were to be reported, would be for an individual withsex = 1.394737
andage = 62.44737
. That doesn't make a lot of sense, although you could request the hazard for that covariate combination from thesurvfit()
function that is called bybasehaz()
.You could plot the
survfit
objectbhCentered
or extract survival over time frombhCentered$surv
andbhCentered$time
.Third, you did not set up your time-stratified model correctly. As Section 4.1, "Step functions," of the time dependence vignette indicates, after calling
survSplit()
you need to use the counting-process format, in your caseSurv(tstart, days, event)
, as the outcome.Fourth, with time strata you need to take extra steps for generating survival curves. Those steps are explained in detail in that section of the vignette, which says:
With your cut at
days = 1826
there are no predictions for the first time stratum after that time, and no predictions for the second time stratum before that time. So you must providesurvfit()
with a new data set.That data set must contain time intervals and the corresponding time strata, the status indicator (required but not used), and values of the covariates. You use the
id
argument tosurvfit()
to indicate which time intervals/strata should be combined into single curves to represent the entire time course of interest. If you really want to do this yourself, I would recommend using realistic values of each of the covariates rather than the centered values. For example in your case:would allow you to get a single reference survival curve out to 10 years for a 60-year-old Male under treatment 1 with a varX value of 25, by calling
If you want to adjust for different sets of covariate values by hand, with covariates fixed in time you can just sum up the products of the coefficients times the corresponding differences in covariate values from those of that reference case.
Fifth, you risk a lot of trouble and errors trying to do these types of calculations yourself. Take advantage of the tools provided by the software, like the
survfit()
function, which also make it straightforward to display confidence intervals around the point estimates.Finally, as Frank Harrell noted in a comment, there probably isn't a sharp jump in hazard at 5 years. You might be better off using a continuous time-varying coefficient, as explained later in the vignette.