Solved – Cox Proportional and Pooled logistic regression in a observational cohort study – laymans terms

case-cohortcox-modelobservational-studyregression

So my brain is just not understanding the difference between the two above.
I am currently writing a literature review and have to explain the significance of what was found in the research.

The researchers did a six year observation cohort study of the use of pioglitazone including 2864 subjects with type 2 diabetes without established cardiovascular disease

They measured the effect of this drug at a baseline year with a Cox proportional hazard model and the time dependant use in each one year examination interval with a pooled logistic regression model.

Results were baseline use of the drug (n=493) did not show a statistically protective effect on primary end point (n=175 – primary endpoint being CVD event or death). Although it tended to reduce the risk.
However pooled logistic regression analysis indicated a significant protect association of pioglitazone with the primary endpoint

The study also states the first one year period was used as a baseline to examine the effect of this drug on cardiovascular events and all cause mortality. Thus, patients who had any primary outcomes within one year after entry (n=26) or were not followed up over the year (n=30) were excluded. Finally a total of 2864 subjects were enrolled in the study.

The results from the cox analysis was limited to the baseline year in terms of pioglitazone use, so the researchers performed a pooled logistic regression anaysis and explored the association of time dependant use of pioglitazone with development of cardiovascular event or all cause death.

Herein 2864 subjects who enetered this observational study contributed to 11,952 person years thus the pooled data set consisted of 11,952 observations with 175 primary endpoints and a logistic analysis was performed on this pooled sample.

Sooo here is where I'm struggling. I know the Cox looks at two variables from time of study to the time of the event (right?) and the logistic regression looks at the two variables from time to event and takes into account other variables (meds discontinued, side effects etc) (right??)

I dont quite understand the significance of the baseline data particularly when both models state n=176 to primary endpoint.
I feel so stupid, I have googled, youtubed, read the article over and over and over again and I still haven't been able to understand it properly. If someone could please help explain the above in laymans terms I would be forever grateful.

Best Answer

Time itself and time-dependent variables are not commonly used as input predictors in logistic regression, which has (for binary logistic) one outcome variable (y/n, 0/1) and all the baseline variable values as predictors along with treatment (0-placebo,1-treated). Cox PH regression can use the same baseline and treatment variable values, but there are two outcome variables for Cox PH regression: time-to-event (e.g., days), and failure(0,1) or "censoring".

If multiple records existed in this study (maybe for follow-up clinic visits) for each subject, then Cox PH can be employed using time-dependent variables (such as multiple LDL or SBP values taken each visit), but logistic doesn't allow for such.

In short, logistic is more for prevalence modeling when the outcome is y/n, and there is no time involved. That is, was there e.g. recurrence (y/n) over the entire follow-up period? On the other hand, Cox PH regression is for time-to-event modeling and requires the time-to-event for each patient, and the failure status at the time of the event (e.g. time=200 days, failure=1), withdrawal from the study (e.g. time=50 days, failure=0 since you know they didn't fail when they withdrew), or last known clinic visit (e.g. time=200 days, failure=0) for subjects who never failed.

If you want to use incidence rates of disease (#new cases/person-years) for sub-populations in a study partitioned by categorical values, i.e., the "density method," then Poisson regression would be used for incidence modeling.

In clinical trials, however, it's commonly assumed that withdrawals are failures, so you assign failure=1 and time to the #days from consent up to the time they withdrew.

For longitudinal modeling with logistic regression, it's possible that generalized linear models (GLM) or generalized estimating equations (GEE) was used, in which a logistic "link" was employed with clustering on each subject ID. (There is not a Cox PH link function for GLM or GEE). GLM/GEE can accomodate a number of link functions such as linear(Gaussian), logistic, Poisson, and can simultaneously use in one model:

  • outcome variable (linear link): repeated measurement outcome (LDL at each clinic visit)
  • baseline predictors: female(0,1), DM(0,1), history of stroke(0,1), history of CKD(0,1).
  • time-dependent predictors: SBP, glucose, etc. during each clinic visit
  • treatment predictor: treatment(0-placebo, 1-drug)
  • time predictor: time (#days up to each clinic visit)
  • time-treatment interaction: time(e.g. days) $\times$ treatment(0,1)

This is called longitudinal modeling, or panel data modeling -- which is much more complex than what's taught in grad-level foundations or intermediate biostat courses. So their analysis is either what I described at first, or much more complex than considered for a beginners perspective. One last point about GLM/GEE, when time and treatment are in the model, the effect of treatment on the outcome has to be determined using the interaction between time and treatment, i.e., timetrt = time $\times$ treatment, which is a new variable that has to be generated by multiplying time by treatment (0,1). If LDL is the outcome, with repeated measurement values at each clinic visit, the regression coefficient for the interaction term timetrt, $\beta_{timetrt}$, and its p-value will reveal whether or not the slopes of the within-subject LDL values (i.e., outcome) were different between placebo and treated. In other words, when adjusting for baseline covariates, time-dependent covariates, a main effect of treatment, and a main effect of time, did the treatment result in significantly different slopes for LDL change over time?

Related Question