Main questions
I'm using conditional logistic regression for a resource selection function (RSF) in an ecology study (more details below). I have no trouble implementing the code but I'm having difficulty grasping what the fitted values represent and how I can ultimately calculate probability of an event for given conditions. I understand that clogit() is actually a Cox Proportional Hazard model with constant time, and I have read up on these models, the help files, and a number of related posts, but so far I have not been able to find answers detailed enough for me to fully understand:
-
What does type "survival" return when using predict.coxph? It is not discussed in the help file. My assumption is that it is the probability of an event not occurring for a given row in the hypothetical next time step (as this is a constant time model), but I don't know if that is accurate?
-
How can I calculate predicted probability of event occurrence? I have seen it suggested that 1-survival should give me this value (which makes sense to me if I'm correct above), but I've also seen it stated that risk/1 + risk would provide this as well, which I do not understand.
-
More generally, when predicting from a stratified Cox, my understanding is that all predictions are relative? For example, with type "risk", I'd be getting the average change hazard ratio within each strata for given predictor values. Is that understanding correct?
I would greatly appreciate any help. Thank you!
Additional details if needed
My goal is to compare nest locations (cases) to random points (controls) for bird species with several continuous predictors and ultimately predict the probability of nest occurrence based on predictors. I use "nest_id" as a strata with the clogit() function from the survival package. The reason for this is that not all random points were available to all birds because of territoriality etc., and so traditional logistic regression would not be appropriate. Each strata consists of a known nest location and 5 random points within the territory that could have theoretically been selected by that nesting pair. Thus, in my case, the "event" would be nest occurrence, not mortality. Here is my code to illustrate:
# My data (rwbl.rsf)
# A tibble: 6 × 5
nest_id status ndvi_pix veg_pix wood_pix
<chr> <dbl> <dbl> <dbl> <dbl>
1 aml0032022 1 0.235 1.39 231.
2 aml0032022 0 0.00778 1.06 232.
3 aml0032022 0 0.174 1.01 201.
4 aml0032022 0 0.306 1.42 207.
5 aml0032022 0 -0.0670 1.22 230.
6 aml0032022 0 -0.228 0.963 228.
# My model
model = clogit(status ~ scale(ndvi_pix) + scale(veg_pix) + scale(wood_pix) + strata(nest_id),
rwbl.rsf, method = "exact")
# Predictions
preds = predict(model, se.fit = T, type = "survival")
preds$fit[1:5]
1 2 3 4 5
0.7315548 0.9185614 0.8761767 0.6907805 0.9302214
Best Answer
When you ask for
expected
orsurvival
predictions from a Cox model and don't specifynewdata
, then you get results directly from the original data and model. You can see what's going on by typingsurvival:::predict.coxph
at the R command prompt. Here's the key code:Here
y
is theSurv
object of the original data, whose last column is the set of event markers, andobject$residuals
are the corresponding martingale residuals, the differences between the observed number of events (here, 1 or 0) and the model-predicted values. Therneau and Grambsch have an accessible explanation of how martingales and martingale residuals play roles in the counting-process approach to Cox models.Each
pred
value in the above code is thus the model-predicted number of events for that case at the corresponding observation time. (The code forclogit
setstime=1
for all observations.) If you specifytype="expected"
, that's whatpredict.coxph()
returns directly.As the
predict.coxph
help page says, "The survival probability for a subject is equal toexp(-expected)
." That's a straightforward result for a continuous-time survival model, as theexpected
number of events at time $t$ is the cumulative hazard $H(t)$, in turn related to the survival probability by $S(t)=\exp(-H(t))$.For your first question, you aren't quite right. What you get is the predicted survival probability if you repeated the experiment again (effectively to
time=1
). Cox models can make no predictions beyond the last observation time, and the way thatclogit
is coded all observation times are set to a value of 1.For your second question, the event probability when there can be at most 1 event is 1 minus the survival probability. I'd recommend using that.
For your third question (related to part of your second question about using "risk"), you are correct that Cox models of "risk" are always relative to something. The choice of reference is thus critical. The
Details
section of the help page forpredict.coxph
explains the default choices, why they were made, and other options.In your application, where stratification is a programming trick to fit a conditional logit model via a Cox model routine and you presumably care more about the regression-coefficient estimates than using them to make within-stratum predictions, I'd be reluctant to use the "risk" type of predictions at all. If you do want such predictions, read the help page very carefully first.