Mixed Effects Logistic Regression – How to Build a Model in R

glmmlme4-nlmelogisticmixed modelr

I am new to data analysis and now working on a Mixed Effects Logistic Regression Model. Currently, I have the following data frame (model_data):

Road Id Vehicle_id  entry_time           exit_time      
   1        1        2017-01-31 00:00:00  2017-01-31 00:00:00
last_veh_time  vehicle_type     status
300            4 wheel          0

Likewise, I have the information (nearly a million rows) of all the vehicles entering a place and exiting the same place. Status can be either $1$ or $0$ for all of these. last_veh_time is the last vehicle passed the same place before how much time.

I want to know the probability of a getting status $0$ for a vehicle based on the entry_time and last_veh_time. To do this, I created the model:

model = glmer(status ~last_veh_time + (1 |link ), 
             data=model_data, REML=FALSE)

Is this the right approach? Any help is appreciated.

Best Answer

There is too little information on your research question and data-structure to really allow good advice, but I'll try.

In this scenario "status" is you "dependent variable", since you suspect that it depends on "entry_time" and "last_veh_time" (your "independent" variables). There is an excellent explanation on terminology here.

Given the minimal information above, I cannot see why you would want to perform a mixed model analysis rather than a standard logistic regression. You only need a mixed effects model when there are repeated measures on a single experimental unit (e.g. if the same vehicleIDs is showing up in several rows) or if there is some other grouping/clustering factor of no interest that makes some vehicles more similar to each other (i.e. cluster together). If this is not the case and you have only one vehicle per line, I think no mixed model is needed.

If you vehicleIDs are showing up repeatedly, your mixed logistic regression might look something like this:

glmer(status ~ entry_time + last_veh_time + vehicle_type + (1 | vehicle_id),
    data = model_data,
    family = binomial)     #The link-function describing the relationship between your  linear model and the binomial distribution of your outcome

PS: Currently "entry_time" is continuous. Depending on your research question you might want to consider whether it makes sense to convert "time" into sub-variables calendar date & daytime or even year & month & day & daytime.

Related Solutions

Solved – How to build a linear mixed-effects model in R

It's entirely up to you as to whether you include both factors in the same model or not. But why not try it and see if you get a significantly better fit with both in than with just one in?
REML works with unbalanced and incomplete designs too. I'd go with REML to reduce the bias in the variance estimates and eliminate the bias in the covariance parameters.
x <- as.factor(z) turns z into a factor. You can of course do DF$x <- as.factor(DF$x).
anova(m1, m3) will test for the significance of the terms left out of the larger model. The models have to be nested for this to work.

Edit: The comments have made me realize my answer was way too terse, so I'm adding some sample code.

The code is not doing the full model that you are doing, it's just to illustrate syntax and what happens with ANOVA:

# Construct sample data; E(y) is a function only of x1
x1 <- c("A","A","A","B","B","C","D","D","D","D")
x2 <- c("A","B","C","A","B","C","A","B","C","A")
y <- rnorm(c(0,0,0,1,1,2,3,3,3,3))  # Values for E(y) match w/ x1

# Construct data frame
df <- data.frame(list(y=y, x1=x1, x2=x2))

# Convert x1, x2 to factors
df$x1 <- as.factor(df$x1)
df$x2 <- as.factor(df$x2)

# Run regressions and perform ANOVA to evaluate effect of factor x2
m1 <- lm(y~x1, data=df)
m2 <- lm(y~x1+x2, data=df)

> anova(m1,m2)
Analysis of Variance Table

Model 1: y ~ x1
Model 2: y ~ x1 + x2
  Res.Df     RSS Df Sum of Sq     F Pr(>F)
1      6 12.9004                          
2      4  5.3241  2    7.5763 2.846 0.1703

The "PR(>F)" column gives the p-value associated with the F-test of whether factor x2 is significant.

Solved – Casewise diagnostics and testing assumptions for a mixed effect logistic regression in R

check out the influence.ME package. It's designed for quantifying group-level rather than observation-level influence, but it works for the latter (specify obs=TRUE in influence()); it can also be pretty slow because it re-estimates the model for each case (e.g., 15 seconds on my laptop to compute observation-level influences for a relatively small LMM fitted to the 144-row Penicillin data set).
for VIF (which I'm not wild about), I'm not sure: this question has been asked and so far not answered. On the other hand, a slightly more general version was asked and answered here (pointing to this Github repository).
In general diagnostics on binary models are difficult: the conceptual problem here is not specific to mixed models but applies to binary GLMs in general. See this question and this question.
I would consider checking out this set of examples for GLMM diagnostics in R.

Best Answer

Related Solutions

Solved – How to build a linear mixed-effects model in R

Solved – Casewise diagnostics and testing assumptions for a mixed effect logistic regression in R

Related Question