Solved – Discrepancy between log likelihood Harrell’s C Index and brier score for the evaluation of a Cox regression

bootstrapbrier-scorecox-modelpredictive-models

I am evaluating a dataset of ~400 subjects and 10 covariates trying to fit a Cox ph model for predicting survival in AML patients.

To evaluate the models I am using a bootstrapping procedure of 50 iterations (sampling with replacement). Models are built on the training set and tested on the test set. For each model I calculate Harrell's C-Index (using rcorr.cens from the Hmisc package) and the Brier score (using the peperr package). I then average the results over the 50 iterations. I get an average C Index of 0.70 and a Brier score of 0.08.

My problem:

When I check the p values of the likelihood ratio test / Wald test of each individual model they are non-significant. Of note, this is a subset analysis. When I build models using other covariates in the dataset I get similar CIndex/Brier scores as well as highly significant likelihood ratio test/Wald tests.

Questions:

What could be the reason for the discrepancy between the evaluation on the testset (CIndex and Brier) and the evaluation on the training set (Likelihood ratio/Wald)
Some covariates are found to be significant in the Cox models (though the model itself is not). Is there any meaning for these covariate considering I can merge these results across 50 iterations ?

p.s. – I am aware that I am using the LASSO regularization (glmnet) for feature selection which is prone to overstating my covariates. This is a limitation of my analysis which I am attempting to minimize by the bootstrapping procedure. Regardless, I would like to figure out what is the reason for the discrepancy in my results.

I am attaching my code. Unfortunately the data are PHI and cannot be shared.

library("survival")
library ("glmnet")
library ("c060")
library ("peperr")
library ("Hmisc")
##-----------------------------------------------------------------------------------
##                                    Accessory Functions
##-----------------------------------------------------------------------------------
## 
## This function takes a predictor matrix and a Surv obj and uses the glmnet lasso 
## regularization method to select the most predictive features and create a coxph 
## object predict_matrix is the 'dataset' reformated to replace categorical variables
## to binary with dummy

dataset2predictmatrix <- function (dataset,nonzero.coef=NULL) {
  #creat Y (survival matrix) for glmnet
  surv_obj <- Surv(dataset$time,dataset$status) 


  ## tranform categorical variables into binary variables with dummy for dataset
  predict_matrix <- model.matrix(~ ., data=dataset, 
                     contrasts.arg=lapply(dataset[,sapply(dataset, is.factor)], 
                                          contrasts))

  ## remove the statu/time variables from the predictor matrix (x) for glmnet
  predict_matrix <- subset (predict_matrix, select=c(-time,-status))

  if (!is.null(nonzero.coef)){
    predict_matrix<-predict_matrix[,nonzero.coef]
  }

  list (predict_matrix=predict_matrix,surv_obj=surv_obj)
}

glmnetfun<-function (predict_matrix,surv_obj){

  ## create a glmnet cox object using lasso regularization and cross validation
  glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")

  ## get the glmnet model 
  glmnet.obj <- glmnet.cv$glmnet.fit

  # find lambda index for the models with least partial likelihood deviance (by cv.glmnet) 
  optimal.lambda <- glmnet.cv$lambda.min    # For a more parsimoneous model use lambda.1se     
  lambda.index <- which(glmnet.obj$lambda==optimal.lambda) 


  # take beta for optimal lambda 
  optimal.beta  <- glmnet.obj$beta[,lambda.index] 

  # find non zero beta coef 
  nonzero.coef <- abs(optimal.beta)>0 
  selectedBeta <- optimal.beta[nonzero.coef] 

  # take only covariates for which beta is not zero 
  selectedVar   <- predict_matrix[,nonzero.coef] 

  # create a dataframe for trainSet with time, status and selected variables in binary representation for evaluation in pec
  reformat_dataSet <- as.data.frame(cbind(surv_obj,selectedVar))

  # create coxph object with pre-defined coefficients 
  glmnet.cox <- coxph(Surv(time,status) ~ .,data=reformat_dataSet,init=selectedBeta,iter=200)

#   # glmnet.cox only with meaningful features selected by stepwise bidirectional AIC feature selection 
#   glmnet.cox.meaningful <- step(coxph(Surv(time,status) ~ .,data=reformat_dataSet),direction="both")
#   
#   selectedVarCox<- selectedVar [,attr(glmnet.cox.meaningful$terms,"term.labels")] 
#   
  # Returned object -> list with glmnet.cv glmnet.obj selectecVar reformat_dataSet
  list(glmnet.cv=glmnet.cv, glmnet.obj=glmnet.obj, glmnet.cox=glmnet.cox, 
       nonzero.coef=nonzero.coef, selectedVar=selectedVar, 
       reformat_dataSet=reformat_dataSet
#         , glmnet.cox.meaningful=glmnet.cox.meaningful, selectedVarCox=selectedVarCox
        )

}


## This function takes as input a surv_obj (testdate Surv object) and a testdata prediction matrix. Calculates a Brier score
## for either a cox model or a glmnet. Returns an ipec object which is the Brier score (requires peperr and c060)

brier_calc <- function (surv_obj, predict_matrix, model=c("cox","glmnet")){
  if (model=="cox"){
    peperr <- peperr.cox <- peperr(response=surv_obj, x=predict_matrix,
                                   fit.fun=fit.coxph, load.all=TRUE,
                                   indices=resample.indices(n=nrow(surv_obj), method="boot", sample.n=50))      
  } else if (model=="glmnet"){
    peperr <- peperr(response=surv_obj, x=predict_matrix,
                     fit.fun=fit.glmnet, args.fit=list(family="cox"),
                     complexity=complexity.glmnet,args.complexity=list(family="cox"),load.all=TRUE,
                     indices=resample.indices(n=nrow(surv_obj), method="boot", sample.n=50))
  }

      # Get error predictions from peperr
      prederr <- perr(peperr)

      # Integrated prediction error Brier score calculation
      ipec<-ipec(prederr, eval.times=peperr$attribute, response=surv_obj)

      list (peperr=peperr,ipec=ipec)
    }


  trainset<-dataset2predictmatrix (dataset)
  models<-glmnetfun (trainset$predict_matrix,trainset$surv_obj)
#   rfsrc<-rfsrcfunc (dataset)
  briercox<-brier_calc (trainset$surv_obj,models$selectedVar,model="cox")
  # requires c060 package which doesn't work on Linux
  brierglmnet<-brier_calc (trainset$surv_obj,trainset$predict_matrix,model="glmnet")
  # briermeaningful <- brier_calc (trainset$surv_obj,models$selectedVarCox,model="cox")
#   brierRFSRC <-pec_calc(rfsrc$obj,dataset)
#   brierRfit<-pec_calc(rfsrc$rfit ,rfsrc$selectedDataset)
  # calculate brier score for coxph with feature selection based on rfsrc
#   selectedDataset<-dataset2predictmatrix (rfsrc$selectedDataset)
#   brierCox.rfsrc<-brier_calc(selectedDataset$surv_obj,selectedDataset$predict_matrix ,model="cox")


  # save image of the workspace after each iteration
  save.image(file)




  ## C-Index calculation 50 iter bootstrapping

  ## initialization of lists for various outputs
  cIndexCox<- list ()
  cIndexglmnet<- list ()
#   cIndexCoxAIC<- list ()
#   cIndexRFSRC<- list()
#   cIndexRfit<- list ()
#   cIndexCox.rfsrc<- list ()

  i<-1
#   pb <- txtProgressBar(min = 1, max = 50, style = 3) ## initialize a progress bar

  train_results_list<-list()
#   rfsrc_train_list<- list ()


  while (i < 51){
    print (paste("Iteration:",i))
#     setTxtProgressBar(pb, i)## run progress bar

    train <- sample(1:nrow(dataset), nrow(dataset), replace = TRUE) ## random sampling with replacement
    # create a dataframe for trainSet with time, status and selected variables in binary representation for evaluation in pec
    trainset <- dataset [train,]
    testset<- dataset [-train,]
    train_matrix <- dataset2predictmatrix (trainset)

    train_results<-glmnetfun(train_matrix$predict_matrix,train_matrix$surv_obj) #runs the sequence of glmnet.cv->glmnet.obj->glmnet.cox
    train_results_list<-c(train_results_list,list(train_results)) #create a list of results of all runs



    #     compute c-index (Harrell's) for cox-glmnet models
    if (!is.null(train_results$glmnet.cox)){
      # creates a matrix of only the variables identified by the glmnet
      test_matrix <- dataset2predictmatrix(testset, nonzero.coef=train_results$nonzero.coef)
      reformat_testSet <- as.data.frame(cbind(test_matrix$surv_obj,test_matrix$predict_matrix)) #casts matrix to dataframe
      cIndexCox <- c(cIndexCox, 
                     1-rcorr.cens(predict(train_results$glmnet.cox, reformat_testSet),test_matrix$surv_obj)[1])
      i=i+1

      # compute c-index (Harrell's) for glmnet models
      test_matrix<- dataset2predictmatrix (testset) # create matrix of the entire testset for glmnet evaluation
      cIndexglmnet <- c(cIndexglmnet, 
                        1-rcorr.cens(predict(train_results$glmnet.cv, test_matrix$predict_matrix),test_matrix$surv_obj)[1])

      save.image(file)

    }

  }

 meanCICox<- c(mean (unlist(cIndexCox),na.rm=TRUE),std.error(unlist(cIndexCox)))
  meanCIglmnet<- c(mean (unlist(cIndexglmnet),na.rm=TRUE),std.error(unlist(cIndexglmnet)))
#

Best Answer

When using the Brier score (for which you have to use a special version to account for censoring - I assume you did that) or the $c$-index, the corresponding log-likelihood-based statistic is the likelihood ratio $\chi^2$ statistic for the full model. Do not look at partial $\chi^2$. Note that except for the Brier score, the R rms package validate function does the calculations and bootstrapping you need, automatically.

Related Solutions

Solved – Problem with BootCV for coxph in pec after feature selection with glmnet (lasso)

pec accepts only R-objects for which a predictSurvProb method exists and glmnet is not such an object.

Currently, predictSurvProb methods are available for the following R-objects:
matrix
aalen, cox.aalen from library(timereg)
mfp from library(mfp)
phnnet, survnnet from library(survnnet)
rpart (from library(rpart))
coxph, survfit from library(survival)
cph, psm from library(rms)
prodlim from library(prodlim)
glm from library(stats)

For calculating the brier score for glmnet one needs to use the peperr package with the c060 library that wraps glmnet as an object suitable for peperr.

peperr_glmnet_noerror <- peperr(response=Surv(time, status), x=x, 
                    fit.fun=fit.glmnet, args.fit=list(family="cox"),
                    complexity=complexity.glmnet,args.complexity=list(family="cox"),
                    indices=resample.indices(n=length(time), method="boot", sample.n=10))

To get the integrated brier score for the entire model it seems one needs to use the ipec function but I still need to research that.

Many thanks to Thomas Hielscher who authored the c060 package and was extremely kind to help me with this.

Solved – a “good fit” Brier score and Harrell’s C Index

Prior CV postings on the matter of GOF measures in generalized linear models:

Find out pseudo R square value for a Logistic Regression analysis

Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)?

Addressing model uncertainty

Compare classifiers based on AUROC or accuracy?

"Goodness of fit" is an elusive notion. Any set of data can be perfectly fit with a complex, saturated model, but such a model will generally be useless despite being perfect. Application of such tests often completely ignores what the model that is being fit to. I find it rather strange that Anderson-Darling and Kolmogorov-Smirnov tests are being called "goodness of fit tests" when they are really being used as "tests of normality".

Models need to be both validated and calibrated and the GOF measures generally tell you very little about those aspects. (It should be noted in passing that the 'rms' function print.cph also reports the Brier score along with a pseudo-R^2 and Somers-D as "discrimination indexes". And it does not report the c-index, perhaps because the Somers-D is equivalent and preceded it historically and Harrell is tired of people misusing it.)

You will note that Frank told you that your proposed strategy in an earlier rhelp posting where you proposed taking a "best" glmnet model and then apply stepwise forward and backward reduction was bad statistical practice. Part of the problem is that you were taking a result from a method which is optimized for prediction (penalized glmnet) and then applying a procedure that was in all probability lowering its predictive capacity.

Your low Brier score is something I see all the time in my research. I work with large datasets where the outcomes of interest are rather rare (mortality over 5-12 years for basically healthy people). Even a good model will only be predicting a mortality rate of 4-5% for most of the people who die and the "error rate" remains high despite many variables being highly significant. Model comparison measures (especially the deviance) are much better guides for decision making than any of the GOF or discrimination measures.

Best Answer

Related Solutions

Solved – Problem with BootCV for coxph in pec after feature selection with glmnet (lasso)

Solved – a “good fit” Brier score and Harrell’s C Index

Related Question