Solved – How to pool (average) hazard ratios

multiple-imputationrregressionsurvival

Let's say the task did not allow me to use either the MICE package or RMS / HMISC to perform survival analysis on multiply imputed data. I had to impute 5 different data sets and calculate separate hazard ratios in each of these.

I need to pool the hazard ratio from each set to one "average" / "pooled" hazard ratio but how is that done? I've searched the web but can't find no formula or package that would facilitate this.

Now, there are plenty of information on pooling hazard ratios for meta-analyses but recall that my estimates originate from the same data which was used to impute 5 complete data sets.

Best Answer

Averaging the coefficients just means that: compute the hazard ratios in each of your dataset, add them and divide by 5 (= the number of imputed datasets).

However, Rubin's rules assume that the sampling distribtution (actually, the posterior distribution) of the estimates is normal. This is more likely to be true for log(hazard ratios) than hazard ratios. Remember that a normal distribution covers all negative and positive numbers, while a hazard ratio cannot be negative. So more often the log-hazard ratios are combined using Rubin's rules and than the combined log-hazard ratio is transformed to hazard ratios.

Related Solutions

P-Value Pooling – How to Pool Bootstrapped P-values Across Multiply Imputed Data Sets

I think both options result in the correct answer. In general, I would prefer method 1 as that preserves the entire distribution.

For method 1, bootstrap the parameter $k$ times within each of the $m$ MI solutions. Then simply mix the $m$ bootstrapped distributions to obtain your final density, now consisting of $k \times m$ samples that include the between-imputation variation. Then treat that as a conventional bootstrap sample to get confidence intervals. Use the Bayesian bootstrap for small samples. I know of no simulation work that investigates this procedure, and this is actually an open problem to be investigated.

For method 2, use the Licht-Rubin procedure. See How to get pooled p-values on tests done in multiple imputed datasets?

Solved – lmer with multiply imputed data

You can do this somewhat by hand if by taking advantage of the lapply functionality in R and the list-structure returned by the Amelia multiple imputation package. Here's a quick example script.

library(Amelia)
library(lme4)
library(merTools)
library(plyr) # for collapsing estimates

Amelia is similar to mice so you can just substitute your variables in from the mice call here -- this example is from a project I was working on.

 a.out <- amelia(dat[sub1, varIndex], idvars = "SCH_ID", 
            noms = varIndex[!varIndex %in% c("SCH_ID", "math12")], 
            m = 10)

a.out is the imputation object, now we need to run the model on each imputed dataset. To do this, we use the lapply function in R to repeat a function over list elements. This function applies the function -- which is the model specification -- to each dataset (d) in the list and returns the results in a list of models.

 mods <- lapply(a.out$imputations,
           function(d) lmer((log(wage) ~ gender + age + age_sqr + 
            occupation + degree + private_sector + overtime + 
             (1+gender|faculty), data = d)

Now we create a data.frame from that list, by simulating the values the fixed and random effects using the functions FEsim and REsim from the merTools package

imputeFEs <- ldply(mods, FEsim, nsims = 1000)
imputeREs <- ldply(mods, REsim, nsims = 1000)

The data.frames above include separate estimates for each dataset, now we need to combine them using a collapse like argument collapse

imputeREs <- ddply(imputeREs, .(X1, X2), summarize, mean = mean(mean), 
               median = mean(median), sd = mean(sd), 
               level = level[1])

imputeFEs <- ddply(imputeFEs, .(var), summarize, meanEff = mean(meanEff), 
               medEff = mean(medEff), sdEff = mean(sdEff))

Now we can also extract some statistics on the variance/covariance for the random effects across the imputed values. Here I have written two simple extractor functions to do this.

REsdExtract <- function(model){
  out <- unlist(lapply(VarCorr(model), attr, "stddev"))
  return(out)
}

REcorrExtract <- function(model){
  out <- unlist(lapply(VarCorr(model), attr, "corre"))
  return(min(unique(out)))
}

And now we can apply them to the models and store them as a vector:

modStats <- cbind(ldply(mods, REsdExtract), ldply(mods, REcorrExtract))

Update

The functions below will get you much closer to the output provided by arm::display by operating on the list of lmer or glmer objects. Hopefully this will be incorporated into the merTools package in the near future:

# Functions to extract standard deviation of random effects from model
REsdExtract <- function(model){
  out <- unlist(lapply(VarCorr(model), attr, "stddev"))
  return(out)
}

#slope intercept correlation from model
REcorrExtract <- function(model){
  out <- unlist(lapply(VarCorr(model), attr, "corre"))
  return(min(unique(out)))
}

modelRandEffStats <- function(modList){
  SDs <- ldply(modList, REsdExtract)
  corrs <- ldply(modList, REcorrExtract)
  tmp <- cbind(SDs, corrs)
  names(tmp) <- c("Imp", "Int", "Slope", "id", "Corr")
  out <- data.frame(IntSD_mean = mean(tmp$Int), 
                        SlopeSD_mean = mean(tmp$Slope), 
                    Corr_mean = mean(tmp$Corr), 
                        IntSD_sd = sd(tmp$Int),
                    SlopeSD_sd = sd(tmp$Slope), 
                        Corr_sd = sd(tmp$Corr))
  return(out)
}

modelFixedEff <- function(modList){
  require(broom)
  fixEst <- ldply(modList, tidy, effects = "fixed")
  # Collapse
  out <- ddply(fixEst, .(term), summarize,
               estimate = mean(estimate), 
               std.error = mean(std.error))
  out$statistic <- out$estimate / out$std.error
  return(out)
}

print.merModList <- function(modList, digits = 3){
  len <- length(modList)
  form <- modList[[1]]@call
  print(form)
  cat("\nFixed Effects:\n")
  fedat <- modelFixedEff(modList)
  dimnames(fedat)[[1]] <- fedat$term
  pfround(fedat[-1, -1], digits)
  cat("\nError Terms Random Effect Std. Devs\n")
  cat("and covariances:\n")
  cat("\n")
  ngrps <- length(VarCorr(modmathG[[1]]))
  errorList <- vector(mode = 'list', length = ngrps)
  corrList <- vector(mode = 'list', length = ngrps)
  for(i in 1:ngrps){
    subList <- lapply(modList, function(x) VarCorr(x)[[i]])
    subList <- apply(simplify2array(subList), 1:2, mean)
    errorList[[i]] <- subList
    subList <- lapply(modList, function(x) attr(VarCorr(x)[[i]], "corre"))
    subList <- min(unique(apply(simplify2array(subList), 1:2, function(x) mean(x))))
    corrList[[i]] <- subList
  }
  errorList <- lapply(errorList, function(x) {
    diag(x) <- sqrt(diag(x))
    return(x)
    })

  lapply(errorList, pfround, digits)
  cat("\nError Term Correlations:\n")
  lapply(corrList, pfround, digits)
  residError <- mean(unlist(lapply(modList, function(x) attr(VarCorr(x), "sc"))))
  cat("\nResidual Error =", fround(residError,
                                             digits), "\n")
  cat("\n---Groups\n")
  ngrps <- lapply(modList[[1]]@flist, function(x) length(levels(x)))
  modn <- getME(modList[[1]], "devcomp")$dims["n"]
  cat(sprintf("number of obs: %d, groups: ", modn))
  cat(paste(paste(names(ngrps), ngrps, sep = ", "),
            collapse = "; "))
  cat("\n")
  cat("\nModel Fit Stats")
  mAIC <- mean(unlist(lapply(modList, AIC)))
  cat(sprintf("\nAIC = %g", round(mAIC, 1)))
  moDsigma.hat <- mean(unlist(lapply(modmathG, sigma)))
  cat("\nOverdispersion parameter =", fround(moDsigma.hat,
                                             digits), "\n")
}

Best Answer

Related Solutions

P-Value Pooling – How to Pool Bootstrapped P-values Across Multiply Imputed Data Sets

Solved – lmer with multiply imputed data

Related Question