Solved – How best to analyze length of stay data in a hospital-based RCT

cox-modellogrank-testskewnessstatistical-powert-test

I am interested in knowing whether or not there is a consensus about the optimal way to analyze hospital length of stay (LOS) data from a RCT. This is typically a very right-skewed distribution, whereby most patients are discharged within a few days to a week, but the rest of the patients have quite unpredictable (and sometimes quite lengthy) stays, which form the right tail of the distribution.

Options for analysis include:

  • t test (assumes normality which is not likely present)
  • Mann Whitney U test
  • logrank test
  • Cox proportional hazards model conditioning on group allocation

Do any of these methods have demonstrably higher power?

Best Answer

I'm actually embarking on a project that does exactly this, although with observational, rather than clinical data. My thoughts have been that because of the unusual shape of most length of stay data, and the really well characterized time scale (you know both the origin and exit time essentially perfectly), the question lends itself really well to survival analysis of some sort. Three options to consider:

  • Cox proportional hazards models, as you've suggested, for comparing between the treatment and exposed arms.
  • Straight Kaplan-Meyer curves, using a log-rank or one of the other tests to examine the differences between them. Miguel Hernan has argued that this is actually the preferable method to use in many cases, as it does not necessarily assume a constant hazard ratio. As you've got a clinical trial, the difficulty of producing covariate adjusted Kaplan-Meyer curves shouldn't be a problem, but even if there are some residual variables you want to control for, this can be done with inverse-probability-of-treatment weights.
  • Parametric survival models. There are, admittedly, less commonly used, but in my case I need a parametric estimate of the underlying hazard, so these are really the only way to go. I wouldn't suggest jumping straight into using the Generalized Gamma model. It's something of a pain to work with - I'd try a simple Exponential, Weibull and Log-Normal and see if any of those produce acceptable fits.
Related Question