Solved – Mixed effects logistic regression with repeated measures predictor

logisticmixed modelrregressionrepeated measures

I've read the threads on some similar topics but I'm not positive that they address the issue that I'm having. I am conducting analysis on a data set investigating the factors that predict whether or not patients have had an exam (binary outcome variable, so I'm using logistic regression). One of the prediction factors is the implementation of an intervention (dichotomous, between-subjects predictor), and I want to see if this intervention was effective across pre-post measures (dichotomous, within-subjects predictor). Patients are nested within doctors, and it is actually the doctors that are either part of the intervention or not. Ultimately, I want to examine the interaction effect of the intervention x time, with the hypothesis that the change in proportion of exams completed (from pre to post) will be higher in the intervention group compared to the control group. I'm trying to run this analysis in R using the glmer package, but I'm not quite sure how to set it up correctly. If these are my variables:

outcome (dichotomous)
intervention (dichotomous, between subjects)
time (dichotomous, within-subjects)
Doctor (nominal, doctor name)
ID (participant ID)

I essentially want to do:

outcome ~ intervention + time + intervention:time

but I want time to be within ID, and I also want to allow random intercepts and slopes by Doctor.

I've tried: outcome ~ 1|Doctor + intervention + time|ID + intervention:time

But this gives me errors about a model frame, formula mismatch in R.

I can't share the data because of publicly identifiable information, but I'm stuck and was hoping somebody could help point me in the right direction.

Thanks

Best Answer

You have participants nested within doctors, so the random structure of the model should reflect this by estimating random intercepts for doctor and participants within doctors, provided that you have sufficient numbers of both (10 is a typical rule of thumb). I would start with the model:

glmer(outcome ~ intervention*time + (1|Doctor/ID), data=mydata, family=binomial(link=logit)

where intervention*time is shorthand for intervention + time + intervention:time. I don't know what you mean by "I want time to be within ID". From your description, time is simply a pre/post indicator variable. You could allow for the effect of time to differ among participants (and/or doctors) by adding a random coefficient for time:

glmer(outcome ~ intervention*time + (time|Doctor/ID), data=mydata, family=binomial(link=logit)

In this formulation, the model will estimate time random slopes for both doctors and participants. If you wanted time random slopes for only participants you would use:

glmer(outcome ~ intervention + time + intervention:time + (1|Doctor) + (time|Doctor:ID), data=mydata, family=binomial(link=logit)

where this uses the fact that (1|Doctor/ID) is just shorthand notation for (1|Doctor) + (1|Doctor:ID

Related Solutions

ANOVA – Exploring Differences Between Repeated Measures ANOVA, ANCOVA, and Linear Mixed Effects Model

First there is the question of whether it is OK to use percent change as the outcome. In a regression model with baseline as a regressor this is a very bad idea because the outcome is mathematically coupled to the regressor which will induce correlation (ie statistically significant associations) where none is actually present (or mask actual change). This is easy to show with a simulation:

We simulate 2 goups of 100 each where in the first instance there is no change from baseline in either group:

set.seed(15)
N <- 200
x1 <- rnorm(N, 50, 10)
trt <- c(rep(0, N/2), rep(1, N/2))  # allocate to 2 groups
x2 <- rnorm(N, 50, 10)   # no change from baseline

So we expect to find nothing of any interest:

summary(lm(x2 ~ x1 * trt))

Coefficients:
Estimate Std. Error t value Pr(>|t|)    
(Intercept) 45.75024    5.37505   8.512 4.43e-15 ***
x1           0.06776    0.10342   0.655    0.513    
trt          3.25135    7.12887   0.456    0.649    
x1:trt      -0.01689    0.13942  -0.121    0.904

as expected. But now we create a percent change variable and use it as the outcome:

pct.change <- 100*(x2 - x1)/x1
summary(lm(pct.change ~ x1 * trt))

Coefficients:
Estimate Std. Error t value Pr(>|t|)    
(Intercept)  97.5339    12.7814   7.631 9.93e-13 ***
x1           -1.9096     0.2459  -7.765 4.44e-13 ***
trt          45.1394    16.9519   2.663  0.00839 ** 
x1:trt       -0.7662     0.3315  -2.311  0.02188 *

Everything is significant ! So we would interpet this as: the expected percent change in weight for a subject in the control group with zero baseline weight is 97; the expected change in the percent change in weight for a subject in the control group for each additional unit of baseline weight is -1.91; the expected difference in the percent change in weight between the control and treatment group for a subject with zero baseline weight is 45; and the expected difference in the percent change in weight between the treatment and control groups for each additional unit of baseline weight is -0.77.... All completely suprious !!!! Note also that with a "percent change" variable, then we have to use language like "expected change in the percent change" which does not help with understanding.

Now let's introduce an actual treatment effect of 10,

x3 <- x1 + rnorm(N, 0, 1) + trt*10  
summary(lm(x3 ~ x1 * trt))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.95933    0.54404  -1.763   0.0794 .  
x1           1.01921    0.01047  97.365   <2e-16 ***
trt         10.78643    0.72156  14.949   <2e-16 ***
x1:trt      -0.01126    0.01411  -0.798   0.4260

...all good.

Now again, we create a percent change variable and use it as the outcome:

pct.change.trt <- 100*(x3 - x1)/x1
summary(lm(pct.change.trt ~ x1 * trt))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.77928    1.23337  -1.443    0.151    
x1           0.03439    0.02373   1.449    0.149    
trt         49.11734    1.63580  30.027   <2e-16 ***
x1:trt      -0.54947    0.03199 -17.175   <2e-16 ***

..more spurious results.

As to the specific models :

Repeated measures ANOVA (with "Weight" as outcome, ["Group", "Time"] as within-factors and adjusting for "subject").

This is one option that could work.

ANCOVA (with "Percent reduction in weight" as outcome, "Group" as between-factor and "baseline weight" as a covariate)

Besides the mathematical coupling problem, this would not control for repeated measures

Linear mixed effects method with "Weight" as outcome, [group, time, group*time] as fixed effects and [subject] as random effect. Again, can we use "Percent reduction in weight" here?

This would be my preferred option, but again, not with percent reduction. This should be equivalent to repeated measures ANOVA. For example with your data:

lmer(wt ~ group*time + age + gender + (1 |Subject, data=mydata)
lme(wt ~ group*time + age + gender, random= ~ 1 | Subject, data=mydata)

You may want to add random slopes by placing one or more of the fixed effects that vary within subjects (only time in this case) to the left of the |, if justified by the theory, study design, and supported by the data. Personally I always start from a model with only random intercepts.

Linear model with interaction: "Percent reduction in weight" ~ "Group" * "Baseline weight"

This should be avoided due to the mathematical coupling problem. Even if baseline was removed as a regressor, this would then just an ANOVA model, and while repeated measures are handled by the percent variable, the residuals may not be close to normal, so inference may be affected.

Solved – How does lmer() handle missing data in repeated measures designs

tl;dr your analysis removes missing data. You lose a bit of power, but the analysis remains unbiased when model assumptions are met. If an assumption is violated, there's not much a missing data method can do for you.

LMER in R will remove missing observations, i.e. a complete case analysis is performed. As with complete case analysis in a likelihood based procedure, depending on the nature of missingness, the resulting analysis is not biased, but is slightly inefficient.

I'm defining a cluster to be repeated measures within subject. Imbalance in cluster size would matter if you used methods for independent data. The resulting analysis would be anti-conservative meaning your 95% CIs would be too narrow to accurately describe the uncertainty. This is because those later observations have a larger residual correlation due to the subject-level uncertainty that is only correctly measured with a mixed model. The mixed model uses a random intercept; this imposes an exchangeable correlation structure meaning repeated measures within subject are positively correlated. The impact of this positive correlation is that repeated measures within subjects are relatively downweighted. This will adjust your power downwards for detection trends in time. Consequently the time by group interaction will be appropriately conservative if model assumptions are not met.

If the power is too low to perform a final analysis, some power may be recovered with integrated likelihood methods, i.e. expectation maximization (EM) or multiple imputation (MI) methods. Accounting for missingness with EM is more tractable than MI, but both are theoretically possible only on the basis that it would improve the efficiency of the analysis. They are complicated to do, require many assumptions, and would raise the eyebrows of a few reviewers.

To underscore all of this, the matter of bias creeps in only when we consider a rather grave type of violation. A model assumption of acute interest is informative loss-to-follow-up. Subjects may drop out early when Group doesn't have the desired effect. This is an issue that may cause subjects contributing more measurements per time to "exaggerate" their specific effects as one which is population-level. The problem with this is that there really is no way to handle this.

One may argue for pessimistic types of imputation such as last observation carried forward (LOCF) or worst observation carried forward (WOCF), each presuming there was at least one negative observation prior to a subject's withdrawal from study.

Best Answer

Related Solutions

ANOVA – Exploring Differences Between Repeated Measures ANOVA, ANCOVA, and Linear Mixed Effects Model

Solved – How does lmer() handle missing data in repeated measures designs

Related Question