Solved – predict functions after clogit in R using survival package

clogitrstatasurvival

I have difficulties understanding the different types of prediction after running survival::clogit in R. I do not think this is due to author's fault, but mainly due to my limited understanding of statistics.

So, I wish someone can enlighten me how the predictions were calculated.

Noted: The data (modified) and my questions are related to this post https://stackoverflow.com/questions/35329585/how-to-get-fitted-values-from-clogit-model?answertab=active#tab-top

QUESTION 1:

For example to predict the linear prediction:

With these data:

set.seed(1)
sim_data <- data.frame(Used = rep(c(1,0,0,0),1250),
                   Open = round(runif(5000,0,50),0),
                   Strata = rep(1:1250,each=4))

Which I run survival::clogit

mod_sim <- clogit(Used ~ Open + strata(Strata), data = sim_data)
summary(mod_sim)

I would get the value for linear prediction as below:

head(predict(mod_sim, type = 'lp')) 

    1            2            3            4 
 0.037724631  0.020958128 -0.006986043 -0.051696716 
    5            6 
 0.066367406 -0.031437192

In Stata, if I run the post-estimation command and specify linear prediction I will get different values as shown below:

-.0363274, -.0530939, -.0810381, -.1257488, -.0279442, -.1257488

I can see that in Stata, the value for the first observation is calculated from -.0027944*13 (for 1st observation)

QUESTION 2:

Can anyone show me how to calculate different types of prediction for example term, risk and expected

Best Answer

The conditional logistic regression model conditions out an intercept for each stratum, so the linear predictor is only determined up to a stratum-specific offset. Stata uses zero for this offset; R uses the stratum mean of the covariate (for numerical stability)

So

> m<-with(sim_data, ave(Open, Strata))
> head(predict(mod_sim, type = 'lp'))
           1            2            3            4            5            6 
 0.037724631  0.020958128 -0.006986043 -0.051696716  0.066367406 -0.031437192 
> head(with(sim_data, coef(mod_sim)*(Open-m) ))
[1]  0.037724631  0.020958128 -0.006986043 -0.051696716  0.066367406 -0.031437192

You get type="risk" just by exponentiating these

With type="terms" the default centering is at the sample mean (mod_sim$means) rather than the stratum mean.

> head(with(sim_data, coef(mod_sim)*(Open-mod_sim$means) ))
[1]  0.03314235  0.01637584 -0.01156833 -0.05627900  0.04152560 -0.05627900
> head(predict(mod_sim, type = 'terms'))
         Open
1  0.03314235
2  0.01637584
3 -0.01156833
4 -0.05627900
5  0.04152560
6 -0.05627900

With type="expected" you're computing an absolute risk, so the intercept for each stratum has to be estimated (and the predictions aren't all that useful, since they are only for the observed strata)

The documentation for this is under ?predict.coxph because clogit uses the relationship to the Cox model for computation

Related Question