Solved – Mixed-effect logistic regression in R, Error: no random effect specified

binomial distributionlogisticmixed modelrregression

In a study per-field disease incidences were collected by sampling 35 plants in each field and counting proportion of infected ones (# of infected/total sampled). In result, each sampling location corresponds to a value between 0 and 1. I have about 100 locations in total, each sampled in the same way.

I have then a column representing two groups of growers knowledge. There are people who have a knowledge and I have a set of disease incidences in their field and I have a group of people who have no knowledge and I have a set of incidence scores in their fields.

Data structure is following

infnumber = c(7,17,26,12,....etc)
totalplants = c(35,35,35,35,....)
incidence = c(0.19, 0.50, 0.75, 0.34, ....) 
knowledge = c("yes","no","yes","yes",....) 

I would like to use mixed-effect logistic regression model in R with the followig structure:

$Y_i$ is the number of infected plants in field $i$. The model is:

$$Y_i \sim Binomial(35,\pi_i)$$

$$\mathrm{logit}(\pi_i)=\log(\frac {\pi_i}{1-\pi_i})=\beta_0 + \beta_1 X_i +\gamma_i$$
where $X_i =1$ if grower on the $i$-th field is have a knowledge, = 0 for otherwise. $\gamma_i$ is random intercept for field $i$ to account for possible over-dispersion. $\beta_1$ is log odds ratio between knowledge vs no knowledge. The mean proportion of infected is $\frac{\exp(\beta_0)}{1 + \exp(\beta_0)}$ for no knowledge grower, and $\frac{exp(\beta_0+\beta_1)}{1+\exp(\beta_0+\beta_1)} $ for knowledge grower.

Which one of the following is a proper way to do it? Or maybe even other solution?

m <- glmer(incidence ~ knowledge + (1|field), data = Data, 
           weights =totalplants,family = binomial(logit))
m <- glmer(incidence ~ knowledge + (1|knowledge), data = Data, 
           weights =totalplants,family = binomial(logit))

This post is a follow up to my previous question

Best Answer

Since you want to include a random intercept for the field grouping variable, according to the mathematical specification of your model, the former syntax that includes the (1 | field) term in the formula argument of glmer() is correct.