My line of thought goes like this. As per the AG model, each individual is represented by a counting process $N_i(t)$ with intensity $\lambda_i(t)$ which can be written as $\lambda_i(t) = \lambda_0(t) \exp(\beta' x_i(t))$. By this it is implicit that only the "current" value of $x_i$ matters for the intensity. The counting process grows when an individual has an event. I also assume independent censoring (actually stopping of the process). Note that these assumptions are implicit from the way that you fitted the model.
The probability of no events in $(a,b)$ (improperly, a "survival") is then
$$
S_i(t|s) = \exp \left( -\int_s^t \lambda_0(u) \exp(\beta'x_i(u)) du \right)
$$
Note that the probability of no events does not directly relate to the probability of one event, as the complementary event to "no events in a period" is "at least one event during a period".
Then say you are interested in $S(t|s_0)$. Then the idea would be to fix $x_i = x_i(s_0)$ (i.e. assume the covariates don't change after $s_0$), and we would get
$$
S_i(t|s_0) = S_i(t) / S_i(s_0)
$$
Since the part before $s_0$ cancels out, together with my assumption that only the current value of $x_i$ matters for the intensity, means that this is equal to
$$
S_i(t|s_i) = \exp \left( -\exp(\beta'x_i(s_0)) \int_{s_0}^t \lambda_0(u)du \right)
$$
Then in R you can do it like that (I give an example with a data set):
library(frailtypack)
data(readmission)
mod1 <- coxph(Surv(t.start, t.stop, event) ~ sex + charlson + cluster(id), data = readmission)
# set the covariate values at s_0
mycov <- data.frame(sex = "Female", charlson = "1-2")
sf <- survfit(mod1, newdata = mycov)
# with different s_0
par(mfrow=c(2,2))
time <- c(300, 500, 700, 1000)
for(i in time) {
pos <- which.max(sf$time[sf$time <= i])
S_s0 <- sf$surv[pos]
with(sf, plot(time[pos:length(time)], surv[pos:length(surv)] / S_s0), type = "l")
}
Here you get the plots of the survival curves, which correspond to the values
$(t, S(t|s_0))$ for $t\geq s_0$.
The other comments that I would give on this are the following: it is difficult to talk about the distribution of the next event. This is because in the AG formulation the time since previous event does not play any role. In other words, if you would like to take that into account, more complicated stochastic models should be used, where for example you include the "previous number of events" as a time-dependent covariate. This does complicate things a lot and the interpretation of the estimated quantities is most likely very difficult.
The second comment I have is about the nature of the time dependent covariates. Mostly, the AG works nicely with "external" covariates, such as air pollution, or something which is not directly measured on the subject ("external" of the recurrent event process). This is mostly because the first expression I wrote here is the probability of no events during $(a,b)$ relies on the assumption that the number of events in any given interval is Poisson distributed. This is true if the covariates are external. A discussion on this can be found in several textbooks, for example in Cook & Lawless at section 2.5. If your time-dependent covariates do depend on the recurrent event process, then it should be modeled jointly with the recurrent events process.
Alright, looking into the code of varImp.train I see, that in case of a classification problem, the variable importance is just computed via the filterVarImp function. So the variables are just ranked by the AUC, as stated in the documentation varImp under model independent metrics.
I tested it, by calling varImp on each of my models and comparing the variable importance values with the ones computed via filterVarImp on the training data.
## Compute variable importance via filter approach
varImps.filtered <- filterVarImp(trainData, trainClasses)
varImps <- list(knn=varImp(models$knn, scale=F),
pda=varImp(models$pda, scale=F))
## Sort variable importance by their average value
## over all classes in decreasing order.
varImps.filtered$Mean <- apply(varImps.filtered, 1, mean)
varImps.filtered <- varImps.filtered[with(varImps.filtered, order(-Mean)), ]
varImps.filtered$Mean <- NULL
... and surprise surprise, it is exactly the same.
> varImps$knn
ROC curve variable importance
variables are sorted by maximum importance across the classes
Class_1 Class_2 Class_3 Class_4 Class_5
V5 0.7094 0.9912 0.9431 0.9231 0.9912
V3 0.3706 0.5631 0.9744 0.9744 0.7831
V9 0.9725 0.9619 0.9725 0.8125 0.8988
V8 0.6887 0.6644 0.8650 0.9531 0.9531
V4 0.9325 0.9194 0.9325 0.6044 0.3138
V10 0.7250 0.8119 0.8544 0.8544 0.8331
V7 0.8169 0.7606 0.8244 0.7025 0.8244
V6 0.3650 0.5775 0.7838 0.8081 0.8081
V11 0.6194 0.7662 0.7662 0.6000 0.6506
V2 0.5138 0.7412 0.7412 0.5938 0.4031
U5 0.5609 0.5731 0.5731 0.4944 0.4834
U4 0.5259 0.5531 0.5531 0.5103 0.5109
U3 0.5134 0.5134 0.5103 0.5384 0.5384
U2 0.5384 0.5203 0.5216 0.5384 0.5219
U1 0.4853 0.5312 0.5312 0.5238 0.4872
> varImps.filtered
Class_1 Class_2 Class_3 Class_4 Class_5
V9 0.9725000 0.9618750 0.9725000 0.8125000 0.8987500
V5 0.7093750 0.9912500 0.9431250 0.9231250 0.9912500
V8 0.6887500 0.6643750 0.8650000 0.9531250 0.9531250
V10 0.7250000 0.8118750 0.8543750 0.8543750 0.8331250
V7 0.8168750 0.7606250 0.8243750 0.7025000 0.8243750
V4 0.9325000 0.9193750 0.9325000 0.6043750 0.3137500
V3 0.3706250 0.5631250 0.9743750 0.9743750 0.7831250
V11 0.6193750 0.7662500 0.7662500 0.6000000 0.6506250
V6 0.3650000 0.5775000 0.7837500 0.8081250 0.8081250
V2 0.5137500 0.7412500 0.7412500 0.5937500 0.4031250
U5 0.5609375 0.5731250 0.5731250 0.4943750 0.4834375
U4 0.5259375 0.5531250 0.5531250 0.5103125 0.5109375
U2 0.5384375 0.5203125 0.5215625 0.5384375 0.5218750
U3 0.5134375 0.5134375 0.5103125 0.5384375 0.5384375
U1 0.4853125 0.5312500 0.5312500 0.5237500 0.4871875
My goal was to come up with a model specific stable feature selection method. The only way I see to achieve this now, is by utilizing caret's inbuilt feature selection methods like "Recursive Feature Selection (RFE)" and "Selection by Filter (SBF)". As far as I understand it, RFE however is only supporting a handful of models in caret. So I might be forced to implement the rfeControl$functions myself.
Best Answer
I have learned that the cause of this problem is that I didn't transform
sex
(which is a character vector) to a factor. In this case, msm() will silently ignore this covariate, which is why the Q matrices were the same. I had tried to specify covariates = ~ factor(sex) but this didn't help.TL;DR: categorical variables should be coded as factors before using these as covariates with the msm() function.