Mixed Effects Logistic Regression – How to Build a Model in R

glmmlme4-nlmelogisticmixed modelr

I am new to data analysis and now working on a Mixed Effects Logistic Regression Model. Currently, I have the following data frame (model_data):

Road Id Vehicle_id  entry_time           exit_time      
   1        1        2017-01-31 00:00:00  2017-01-31 00:00:00
last_veh_time  vehicle_type     status
300            4 wheel          0 

Likewise, I have the information (nearly a million rows) of all the vehicles entering a place and exiting the same place. Status can be either $1$ or $0$ for all of these. last_veh_time is the last vehicle passed the same place before how much time.

I want to know the probability of a getting status $0$ for a vehicle based on the entry_time and last_veh_time. To do this, I created the model:

model = glmer(status ~last_veh_time + (1 |link ), 
             data=model_data, REML=FALSE)

Is this the right approach? Any help is appreciated.

Best Answer

There is too little information on your research question and data-structure to really allow good advice, but I'll try.

In this scenario "status" is you "dependent variable", since you suspect that it depends on "entry_time" and "last_veh_time" (your "independent" variables). There is an excellent explanation on terminology here.

Given the minimal information above, I cannot see why you would want to perform a mixed model analysis rather than a standard logistic regression. You only need a mixed effects model when there are repeated measures on a single experimental unit (e.g. if the same vehicleIDs is showing up in several rows) or if there is some other grouping/clustering factor of no interest that makes some vehicles more similar to each other (i.e. cluster together). If this is not the case and you have only one vehicle per line, I think no mixed model is needed.

If you vehicleIDs are showing up repeatedly, your mixed logistic regression might look something like this:

glmer(status ~ entry_time + last_veh_time + vehicle_type + (1 | vehicle_id),
    data = model_data,
    family = binomial)     #The link-function describing the relationship between your  linear model and the binomial distribution of your outcome

PS: Currently "entry_time" is continuous. Depending on your research question you might want to consider whether it makes sense to convert "time" into sub-variables calendar date & daytime or even year & month & day & daytime.

Related Question