I get a couple of puzzling results in my (repeated event) cox model when I introduce interaction effects. I will here pose several questions about interaction effects (in survival analysis context) in order to – hopefully– once for all to get the answers to these questions. I've checked similar posts, related to this matter (1, 2, 3, 4, 5, 6, 7, 8 ), and some of them are unanswered, while the others are answered ambiguously. Some of them are helpful. In general, I belive there is a need (and interest in) for some clarification about interaction effects – a quite complicated area for all quantitative methods–focused student/professionals.
Ultimately, my questions relate to the logic behind interactions and their subsequent interpretation in the analysis. Below I present 5 different scenarios/models derived from my data analysis – but I extend them a bit to also include other examples that might be of help for me and (hopefully) for other people on this website.
For every scenario, I provide my own interpretations (in order to capture the essence and logic, they're not comprehensive interpretations) – so those of you who are able to answer, please reject or support them. If possible, provide a correct answer and elaborate why something was incorrect.
- Scenario 1
Suppose that I have a model with 2 covariates where one of the covariates is my main explanatory variable (note that it makes sense to have this variable without an interaction term as well). Guided by my theoretical considerations, I (also) introduce an interaction term between them.
My main explanatory variable (X) is on the scale 0 to 10 (think of number of appearances) and the other covariate (D) is also a continuous variable (ranging from 0 to 10). The model with interaction term:
model.1<–coxph(start, stop, event)~X+D+X:D+cluster(ID)+strata(enum), data=mydata)
exp(coef) exp(-coef) lower .95 upper .95
X 1.069 0.9356 0.9798 1.166
D 1.046*** 0.9561 1.0213 1.071
X*D 1.000 0.9999 0.9876 1.013
Suppose now that in model with only X+D (with no interaction term), my main variable X was significant. It is not significant in the interaction model (see above result).
My interpretation 1) I simply state that there were no interaction effects between X and D. However, while the D variable is significant (with increasing hazard rate) the X is not. Thus, my main explanatory variable is not sufficient to explain this. Alternatively, 2) I state that there were no interaction effects, and the coef. of X in the interaction model does not make any sense or is hard to interpret. I don't even show this results, but put it on a note.
Question: how should I interpret interaction effects between two continuous variables in this model?
- Scenario 2
In this scenario the X variable is still a continuous variable 0-10, but the D-variable is now dichotomous.
exp(coef) exp(-coef) lower .95 upper .95
X 1.0677. 0.9366 0.9933 1.148
D 1.3628*** 0.7338 1.1351 1.636
X*D 0.9994 1.0006 0.9150 1.092
My interpretation: "X:D" is decreasing, i.e. when D=0 and X increasing, the hazard for experiencing the event is decreasing(weak), but the effect is not significant. When "D" is = 1, the hazard is increasing.
- Scenario 3
"X" is till continuous, but the "D" is now categorical (0 = no appearances, 1 = one appearance, 2 = two appearances, 3 = three appearances).
exp(coef) exp(-coef) lower .95 upper .95
X 1.0491*** 0.9532 1.0226 1.076
factor(D)1 1.2237 0.8172 0.8350 1.793
factor(D)2 1.7871. 0.5596 0.9910 3.223
factor(D)3 1.0578 0.9453 0.4625 2.420
X*factor(D)1 0.9849 1.0153 0.9336 1.039
X*factor(D)2 0.9859 1.0143 0.9021 1.077
X*factor(D)3 1.0390 0.9625 0.9230 1.170
Question: How should I interpret the interaction term here?
- Scenario 4
Now the "X" becomes dichotomous (1/0) and the "D" remains categorical as in Scenario 3.
exp(coef) exp(-coef) lower .95 upper .95
X 1.386** 0.7214 1.1315 1.698
factor(D)1 1.195 0.8370 0.8435 1.692
factor(D)2 1.659. 0.6029 0.9635 2.855
factor(D)3 1.061 0.9425 0.4820 2.336
X*factor(D)1 0.900 1.1111 0.5848 1.385
X*factor(D)2 0.986 1.0142 0.4979 1.952
X*factor(D)3 1.352 0.7394 0.5097 3.589
My interpretation: The interaction term is not significant, as in all Scenarios. But the interpretation would be that when X is = 1, the D = 1 and D = 2 are decreasing (compared to D=0) but when X=1 and D=3, the hazard is increasing.
- Scenario 5
Suppose now that the "X" and the "D" variables are exactly the same as in the previous scenario. However, this time, variable "X" violates the PH assumption. So I am introducing an interaction term between X and stop/start time (years). I know that some would argue that one needs to split the data before doing this, while others would not necessary recommend this. This is somehow a side-debate here. Interesting, but not really relevant here for our example. It's also been discussed elsewhere here. Nevertheless, here is the model:
exp(coef) exp(-coef) lower .95 upper .95
X 1.5848* 0.6310 1.0795 2.3268
factor(D)1 1.1301 0.8849 0.9192 1.3893
factor(D)2 1.6507** 0.6058 1.1655 2.3378
factor(D)3 1.2698 0.7875 0.7991 2.0179
X*stop 0.9488* 1.0540 0.9026 0.9973
My interpretation: The interaction with time does correct for the violation of the assumption: X is decreasing with years. However, X alone is increasing. What is going on here? It doesn't make any sense to me. Unless, the X = 0 (alone), and X = 1 with * stop in the model. If so, the interpretation is then that X = 1 * stop is decreasing over time, while when X = 0, the hazard rate increases with 1.58.
EDIT (additional information):
The variables "X" and "D" are actually discrete (1, 2, 3, 4,..10) but they are treated as continuous.
I use conditional model ( or "PWP"-model), and the time scale is "time since entry".
Both X and D are time-dependent (or time-varying) variables.
Best Answer
I'll give a try to answer this, but keep in mind that I do not have a real experience with the PWP model and if anybody has a better input that would be welcome.
General observations
Other general things Another thing would be that in the model selection you should also factor in what would make sense and what would be a useful model for your research question. I think it's generally bad practice to fit a lot of models without knowing beforehand what question that model answers.
A puzzling quote is
This is why statistics textbooks exist. Any decent book on regression models should explain interaction effects. For example, I used the Fox book (but I assume there are plenty out there).
As a final recommendation, it would be instructive to write down the hazards expressions and their estimates for all the groups and the combination of groups, with pen and paper. This I think would clear up many of the confusions that you encounter in interpreting these effects.
Keeping all these in mind, I'll give some comments on the scenarios that you mentioned.
Scenario 1
My intuition tells me that this should not happen too often, i.e. removing something not significant should not alter the other estimates a lot. This might happen though because you lose power when adding the interaction effect (you lose "degrees of freedom").
Keep in mind observations 1, 2 and 5. There might be an interaction effect, but you just don't have enough power to detect it. The coefficient of the main effect of $X$ does make (some) sense: it is the log-hazard ratio for a subject with $D=0$.
Scenario 2
Again keep in mind observations 1 and 2.
The total effect of $X$ is $\log(1.0677) + d \times \log(0.9994) = 0.0655 - d \times 0.0006$. So a larger value of $X$ leads to an increased hazard (ratio), regardless of $D$. The total effect of $X$ is slightly smaller when $D=1$.
Scenario 3 Here there are 3 interaction terms. It is instructive to compute again the total effect of $X$, conditional on the values of $D$. It looks like for $D\in\left\{1,2\right\}$ the effect of $X$ is attenuated as compared to when $D=0$, and for $D=3$ the effect is amplified, as compared to when $D=0$. The interactions are not significant, which means that you do not have enough power to reject the hypothesis of interaction in this data set.
Scenario 4
If I read this in a paper I would be hopelessly confused. What is decreasing? The $D$? (I have a feeling I know what you refer to, but you should try to express things less informal).
Scenario 5
Again, I don't understand your interpretation. What does "$X$ is decreasing with years" mean? Is that the value of $X$? Is it the effect on the hazard ratio? At first glance, it seems to me that the $X=1$ group has a higher hazard rate than the $X=0$ group, at time $0$. As time goes by, this difference becomes smaller.