Solved – Discrepancy between log likelihood Harrell’s C Index and brier score for the evaluation of a Cox regression

bootstrapbrier-scorecox-modelpredictive-models

I am evaluating a dataset of ~400 subjects and 10 covariates trying to fit a Cox ph model for predicting survival in AML patients.

To evaluate the models I am using a bootstrapping procedure of 50 iterations (sampling with replacement). Models are built on the training set and tested on the test set. For each model I calculate Harrell's C-Index (using rcorr.cens from the Hmisc package) and the Brier score (using the peperr package). I then average the results over the 50 iterations. I get an average C Index of 0.70 and a Brier score of 0.08.

My problem:

When I check the p values of the likelihood ratio test / Wald test of each individual model they are non-significant. Of note, this is a subset analysis. When I build models using other covariates in the dataset I get similar CIndex/Brier scores as well as highly significant likelihood ratio test/Wald tests.

Questions:

  1. What could be the reason for the discrepancy between the evaluation on the testset (CIndex and Brier) and the evaluation on the training set (Likelihood ratio/Wald)

  2. Some covariates are found to be significant in the Cox models (though the model itself is not). Is there any meaning for these covariate considering I can merge these results across 50 iterations ?

p.s. – I am aware that I am using the LASSO regularization (glmnet) for feature selection which is prone to overstating my covariates. This is a limitation of my analysis which I am attempting to minimize by the bootstrapping procedure. Regardless, I would like to figure out what is the reason for the discrepancy in my results.

I am attaching my code. Unfortunately the data are PHI and cannot be shared.

library("survival")
library ("glmnet")
library ("c060")
library ("peperr")
library ("Hmisc")
##-----------------------------------------------------------------------------------
##                                    Accessory Functions
##-----------------------------------------------------------------------------------
## 
## This function takes a predictor matrix and a Surv obj and uses the glmnet lasso 
## regularization method to select the most predictive features and create a coxph 
## object predict_matrix is the 'dataset' reformated to replace categorical variables
## to binary with dummy

dataset2predictmatrix <- function (dataset,nonzero.coef=NULL) {
  #creat Y (survival matrix) for glmnet
  surv_obj <- Surv(dataset$time,dataset$status) 


  ## tranform categorical variables into binary variables with dummy for dataset
  predict_matrix <- model.matrix(~ ., data=dataset, 
                     contrasts.arg=lapply(dataset[,sapply(dataset, is.factor)], 
                                          contrasts))

  ## remove the statu/time variables from the predictor matrix (x) for glmnet
  predict_matrix <- subset (predict_matrix, select=c(-time,-status))

  if (!is.null(nonzero.coef)){
    predict_matrix<-predict_matrix[,nonzero.coef]
  }

  list (predict_matrix=predict_matrix,surv_obj=surv_obj)
}

glmnetfun<-function (predict_matrix,surv_obj){

  ## create a glmnet cox object using lasso regularization and cross validation
  glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")

  ## get the glmnet model 
  glmnet.obj <- glmnet.cv$glmnet.fit

  # find lambda index for the models with least partial likelihood deviance (by cv.glmnet) 
  optimal.lambda <- glmnet.cv$lambda.min    # For a more parsimoneous model use lambda.1se     
  lambda.index <- which(glmnet.obj$lambda==optimal.lambda) 


  # take beta for optimal lambda 
  optimal.beta  <- glmnet.obj$beta[,lambda.index] 

  # find non zero beta coef 
  nonzero.coef <- abs(optimal.beta)>0 
  selectedBeta <- optimal.beta[nonzero.coef] 

  # take only covariates for which beta is not zero 
  selectedVar   <- predict_matrix[,nonzero.coef] 

  # create a dataframe for trainSet with time, status and selected variables in binary representation for evaluation in pec
  reformat_dataSet <- as.data.frame(cbind(surv_obj,selectedVar))

  # create coxph object with pre-defined coefficients 
  glmnet.cox <- coxph(Surv(time,status) ~ .,data=reformat_dataSet,init=selectedBeta,iter=200)

#   # glmnet.cox only with meaningful features selected by stepwise bidirectional AIC feature selection 
#   glmnet.cox.meaningful <- step(coxph(Surv(time,status) ~ .,data=reformat_dataSet),direction="both")
#   
#   selectedVarCox<- selectedVar [,attr(glmnet.cox.meaningful$terms,"term.labels")] 
#   
  # Returned object -> list with glmnet.cv glmnet.obj selectecVar reformat_dataSet
  list(glmnet.cv=glmnet.cv, glmnet.obj=glmnet.obj, glmnet.cox=glmnet.cox, 
       nonzero.coef=nonzero.coef, selectedVar=selectedVar, 
       reformat_dataSet=reformat_dataSet
#         , glmnet.cox.meaningful=glmnet.cox.meaningful, selectedVarCox=selectedVarCox
        )

}


## This function takes as input a surv_obj (testdate Surv object) and a testdata prediction matrix. Calculates a Brier score
## for either a cox model or a glmnet. Returns an ipec object which is the Brier score (requires peperr and c060)

brier_calc <- function (surv_obj, predict_matrix, model=c("cox","glmnet")){
  if (model=="cox"){
    peperr <- peperr.cox <- peperr(response=surv_obj, x=predict_matrix,
                                   fit.fun=fit.coxph, load.all=TRUE,
                                   indices=resample.indices(n=nrow(surv_obj), method="boot", sample.n=50))      
  } else if (model=="glmnet"){
    peperr <- peperr(response=surv_obj, x=predict_matrix,
                     fit.fun=fit.glmnet, args.fit=list(family="cox"),
                     complexity=complexity.glmnet,args.complexity=list(family="cox"),load.all=TRUE,
                     indices=resample.indices(n=nrow(surv_obj), method="boot", sample.n=50))
  }

      # Get error predictions from peperr
      prederr <- perr(peperr)

      # Integrated prediction error Brier score calculation
      ipec<-ipec(prederr, eval.times=peperr$attribute, response=surv_obj)

      list (peperr=peperr,ipec=ipec)
    }


  trainset<-dataset2predictmatrix (dataset)
  models<-glmnetfun (trainset$predict_matrix,trainset$surv_obj)
#   rfsrc<-rfsrcfunc (dataset)
  briercox<-brier_calc (trainset$surv_obj,models$selectedVar,model="cox")
  # requires c060 package which doesn't work on Linux
  brierglmnet<-brier_calc (trainset$surv_obj,trainset$predict_matrix,model="glmnet")
  # briermeaningful <- brier_calc (trainset$surv_obj,models$selectedVarCox,model="cox")
#   brierRFSRC <-pec_calc(rfsrc$obj,dataset)
#   brierRfit<-pec_calc(rfsrc$rfit ,rfsrc$selectedDataset)
  # calculate brier score for coxph with feature selection based on rfsrc
#   selectedDataset<-dataset2predictmatrix (rfsrc$selectedDataset)
#   brierCox.rfsrc<-brier_calc(selectedDataset$surv_obj,selectedDataset$predict_matrix ,model="cox")


  # save image of the workspace after each iteration
  save.image(file)




  ## C-Index calculation 50 iter bootstrapping

  ## initialization of lists for various outputs
  cIndexCox<- list ()
  cIndexglmnet<- list ()
#   cIndexCoxAIC<- list ()
#   cIndexRFSRC<- list()
#   cIndexRfit<- list ()
#   cIndexCox.rfsrc<- list ()

  i<-1
#   pb <- txtProgressBar(min = 1, max = 50, style = 3) ## initialize a progress bar

  train_results_list<-list()
#   rfsrc_train_list<- list ()


  while (i < 51){
    print (paste("Iteration:",i))
#     setTxtProgressBar(pb, i)## run progress bar

    train <- sample(1:nrow(dataset), nrow(dataset), replace = TRUE) ## random sampling with replacement
    # create a dataframe for trainSet with time, status and selected variables in binary representation for evaluation in pec
    trainset <- dataset [train,]
    testset<- dataset [-train,]
    train_matrix <- dataset2predictmatrix (trainset)

    train_results<-glmnetfun(train_matrix$predict_matrix,train_matrix$surv_obj) #runs the sequence of glmnet.cv->glmnet.obj->glmnet.cox
    train_results_list<-c(train_results_list,list(train_results)) #create a list of results of all runs



    #     compute c-index (Harrell's) for cox-glmnet models
    if (!is.null(train_results$glmnet.cox)){
      # creates a matrix of only the variables identified by the glmnet
      test_matrix <- dataset2predictmatrix(testset, nonzero.coef=train_results$nonzero.coef)
      reformat_testSet <- as.data.frame(cbind(test_matrix$surv_obj,test_matrix$predict_matrix)) #casts matrix to dataframe
      cIndexCox <- c(cIndexCox, 
                     1-rcorr.cens(predict(train_results$glmnet.cox, reformat_testSet),test_matrix$surv_obj)[1])
      i=i+1

      # compute c-index (Harrell's) for glmnet models
      test_matrix<- dataset2predictmatrix (testset) # create matrix of the entire testset for glmnet evaluation
      cIndexglmnet <- c(cIndexglmnet, 
                        1-rcorr.cens(predict(train_results$glmnet.cv, test_matrix$predict_matrix),test_matrix$surv_obj)[1])

      save.image(file)

    }

  }

 meanCICox<- c(mean (unlist(cIndexCox),na.rm=TRUE),std.error(unlist(cIndexCox)))
  meanCIglmnet<- c(mean (unlist(cIndexglmnet),na.rm=TRUE),std.error(unlist(cIndexglmnet)))
#

Best Answer

When using the Brier score (for which you have to use a special version to account for censoring - I assume you did that) or the $c$-index, the corresponding log-likelihood-based statistic is the likelihood ratio $\chi^2$ statistic for the full model. Do not look at partial $\chi^2$. Note that except for the Brier score, the R rms package validate function does the calculations and bootstrapping you need, automatically.

Related Question