I have difficulties understanding the different types of prediction after running survival::clogit
in R. I do not think this is due to author's fault, but mainly due to my limited understanding of statistics.
So, I wish someone can enlighten me how the predictions were calculated.
Noted: The data (modified) and my questions are related to this post https://stackoverflow.com/questions/35329585/how-to-get-fitted-values-from-clogit-model?answertab=active#tab-top
QUESTION 1:
For example to predict the linear prediction:
With these data:
set.seed(1)
sim_data <- data.frame(Used = rep(c(1,0,0,0),1250),
Open = round(runif(5000,0,50),0),
Strata = rep(1:1250,each=4))
Which I run survival::clogit
mod_sim <- clogit(Used ~ Open + strata(Strata), data = sim_data)
summary(mod_sim)
I would get the value for linear prediction as below:
head(predict(mod_sim, type = 'lp'))
1 2 3 4
0.037724631 0.020958128 -0.006986043 -0.051696716
5 6
0.066367406 -0.031437192
In Stata, if I run the post-estimation command and specify linear prediction I will get different values as shown below:
-.0363274, -.0530939, -.0810381, -.1257488, -.0279442, -.1257488
I can see that in Stata, the value for the first observation is calculated from -.0027944*13 (for 1st observation)
QUESTION 2:
Can anyone show me how to calculate different types of prediction for example term
, risk
and expected
Best Answer
The conditional logistic regression model conditions out an intercept for each stratum, so the linear predictor is only determined up to a stratum-specific offset. Stata uses zero for this offset; R uses the stratum mean of the covariate (for numerical stability)
So
You get
type="risk"
just by exponentiating theseWith
type="terms"
the default centering is at the sample mean (mod_sim$means
) rather than the stratum mean.With
type="expected"
you're computing an absolute risk, so the intercept for each stratum has to be estimated (and the predictions aren't all that useful, since they are only for the observed strata)The documentation for this is under
?predict.coxph
becauseclogit
uses the relationship to the Cox model for computation