As Stephen Senn has written, it is not appropriate to compare baseline distributions in a randomized study. The way I like to talk about this is to ask the question "where do you stop?", i.e., how many other baseline covariates should you go back and try to retrieve? You will find counter-balancing covariates if you look hard enough.
The basis for chosing a model is not post-hoc differences but rather apriori subject matter knowledge about which variables are likely to be important predictors of the response variable. The baseline version of the response variable is certainly a dominating predictor but there are others that are likely to be important. The goal is explaining explainable heterogeneity in the outcome to maximize precision and power. There is almost no role for statistical significance testing in model formulation.
A pre-specified model will take care of chance differences on the variables that matter - those predicting the outcome.
It would seem most important to ensure that the groups had similar QoL outcome measures at t1
or t2
, as those time points are fixed with respect to the intervention time. Your question says that you might include a t1
versus t0
comparison. That doesn't seem to make much sense in terms of evaluating the intervention, however, as the intervention doesn't happen until after t2
.
You might want to examine changes from t0
to t1
in a separate analysis with a continuous measure of time in days. That would help you evaluate whether the two treatment groups were adequately well matched both at enrollment into the study and at the time point t1
that (unlike t0
) occurs at a fixed time prior to the intervention. It would also let you see if there is any systematic change over time in the outcome measure absent the intervention.
If the groups are adequately well matched at t1
, however, I don't see any need to use values at t0
as part of evaluating the intervention itself. You might, however, need to evaluate them as part of your quality control.
In response to comments
I think it's important to distinguish the direct effects of the intervention from possible changes in QoL values associated with the treatment-group assignment, presumably done at t0
, which might lead to systematic differences between t0
and t1
.
With similar distributions of QoL values between the 2 treatment groups at t1
, the specific effects of interventions per se can probably be described as differences between pre-intervention (t1
, t2
) and post-intervention (t3
, t4
) QoL values. Think carefully how you want to do that, as the more coefficients you have to estimate the lower power you might have.
For example, might the QoL values at t1
and t2
be considered replicates rather than separate values? Might it make sense to model QoL differences between t1
and t2
against corresponding differences between t3
and t4
, both representing 13-day periods? You need to apply your knowledge of the subject matter to make those decisions.
You certainly should examine potential changes between t0
and t1
, but such changes would have to do with either the time interval or the group assignment (e.g., due to the potential psychological effects you mention) rather than with the intervention per se. They thus would require a type of explanation other than a direct effect of the intervention.
Don't overthink the t0
to t1
differences. What you presumably want to do is to assure yourself and your audience that any such differences between the 2 assignment groups are small enough not to affect your interpretation of the direct intervention effect. Don't worry so much about whether you have the "best" model for the t0
to t1
difference. Just develop one that's adequate to address that potential concern.
A simple analysis of the paired t1-t0
differences within individuals should be adequate and accomplish more simply what you propose in a comment to do with a mixed model. If you are only examining paired t1-t0
differences you don't need the time*treatmentgroup
interaction, just the treatmentgroup
assignment itself. Flexible inclusion of timeddifference
in the model of the t1-t0
QoL paired differences with a regression spline makes sense. You will need more than the 2 degrees of freedom you propose in the model in your comment, however, as that doesn't allow any knots at all. I prefer to model splines with the rcs()
function in the R rms()
package, in part because (unlike ns()
) it provides reasonable default parameter settings.
Best Answer
You might want to read the following article
A few thoughts, although I confess I'm not an expert on this: