Solved – Calculating confidence interval for average hospital length of stay, case-mix adjusted, in R

bootstrapconfidence intervalr

I'm looking for sample R code, or pointers to sample R code for the following. (Gentle) critique of the approach would also be appreciated. I'm not a statistician and I'm pretty new to R.

I have duration of hospitalization ("length of stay"=LOS) data for 2,000 patients from my hospital and 50,000 patients from a comparison data set. Each patient has a discharge year (YEAR) and is assigned to a diagnosis group (DG). My goal is to compare (with confidence intervals) the overall average LOS at my hospital to the expected average LOS if those patients had been part of the comparison data set.

Note that for a specific year and specific DG there may be anywhere from a few to 1,000s of patients, and, of course, the distribution of LOS is not normal. There could possibly even be a DG with 1 or more patients for a given year in my hospital's data but none in the comparison data set.

The approach I was considering was to create a comparison group where a random patient from the comparison data set is chosen for each patient from my hospital. The random patient would be matched by YEAR and DG. I would calculate the expected average LOS for this group and then repeat the process 10,000 times to determine the 2.5th to 97.5th percentile. I would repeat for each year and plot my hospital's average LOS versus the 95% CI for the expected average LOS.

To deal with the issue of there not being a match for a patient for a given DG for a given YEAR I was thinking of loosening the match criteria to pick patients from the previous or next year. I could keep broadening the match year until there were at least N patients from which to randomly pick.

Thoughts?

Best Answer

The suggestion by Jeff to consider nonparametric methods is a good one. Semiparametric models such as the Cox proportional hazards model may be even better because of their flexibility. The Cox model in particular will handle one feature of the problem that the other methods discussed will not: LOS is actually an incompletely observed random variable. Those patients dying in the hospital should not be considered to have a short LOS but to have their LOS right censored at the day of death.

From the Cox model you can estimate median and mean LOS, covariate adjusted. Examples are at http://biostat.mc.vanderbilt.edu/wiki/pub/Main/FHHandouts/slide.pdf with S code at http://biostat.mc.vanderbilt.edu/wiki/pub/Main/FHHandouts/model.s

Related Question