Logistic Regression – Modeling Hospital Readmissions Including Multiple Admissions

logisticmixed modelmultiple regressionr

I am evaluating the impact of a post-discharge program on readmissions at the hospital where I work. I'm using logistic regression to do this where each patient's hospital admission within a two-year time frame is coded 1/0 for whether or not a readmission occurred. Variables include mostly patient factors surrounding readmission risk (comorbidities, gender, lab values, etc).

My question is around how to handle patients who have multiple admissions in the time frame. I'm worried that if I include multiple admissions per patient, the data from these patients will bias the estimation of the parameters towards frequent users versus patients with fewer admissions or only one admission. I've read published studies that are similar to what I'm doing and most seem to handle this issue by keeping only a single "anchor" admission for each patient, dropping the others.

I'm worried that if I follow this approach, I'll lose too much data. As is, I work for a small/medium-sized hospital and I've got about 2,600 admissions to work with. If I keep only, say, each patient's earliest admission in the time period, I lose about 600 observations. I'm also concerned that if I only keep one admission per person, my model won't reflect the real world where people do, in fact, have multiple admissions.

My thought is to keep all 2,600 observations and estimate two models. The first model would use the glm command to estimate a plain vanilla fixed effects logistic regression. The second model would use the glmer command from lme4 to estimate a mixed model with a random intercept for patient id. I'd take a similar approach as is done here. I can compare models using AIC. No difference, I'd use the fixed effects regression since the interpretation is a bit more straightforward.

I'm interested in thoughts on this approach and whether my concerns present a valid problem for inference or show a misunderstanding of certain concepts.

Best Answer

If you have actual discharge and readmission dates for each patient, then this might best be handled with repeated-event survival analysis. That's presented in the main R survival vignette and the multi-state vignette.

Such models directly deal with the within-patient correlations that rightly concern you. You can choose a robust variance estimate similar to those of generalized estimating equations by specifying a cluster() term for the individual IDs, or a frailty() term to handle the correlations with a simple type of random effect. There's also a coxme package for more complicated mixed-model designs.

In addition, survival analysis will provide information about the time to readmission, not just the fact of readmission. That's important if you are particularly interested in things like readmission within 30 days. Survival analysis will also handle the fact that a patient's readmission might occur after your observation period while taking into account the information you get from that patient while still under observation.

The question then will be how to choose the time = 0 reference for survival. It seems that you should reset the clock to 0 at each discharge time to estimate the time to the subsequent readmission. You could then specify covariate values in place at each individual discharge for the patient. More detailed modeling with a multi-state model (e.g., states of: in hospital, out of hospital, death) might be considered.

Related Question