Solved – Cluster size in generalized estimation equation (GEE)

generalized-estimating-equationsr

I'm using the gee() function from the gee package in R. The problem I'm having is that the 'Maximum cluster size' that I get from the output of the GEE function seems to disagree with what I believe it should be given my data.

Here's a small example, where I have six observations from each of ten patients, which are members of one of three groups:

ID <- rep(1:10, 6)
myData <- data.frame(ID)
myData$ID <- factor(myData$ID)
myData$Group <- c(rep('A', 20), rep('B', 20), rep('C', 20))
    myData$value <- rnorm(60, mean = 10, sd = 1)

Calling summary(myData$ID) shows that I do, in fact, have six observations of each ID, and calling class(myData$ID) shows that ID is, in fact, a factor. Therefore, I would expect that the maximum cluster size would be six. However, when I call the following…

gee(value ~ Group, id = ID, data = myData)

…the Maximum cluster size that prints out is 1.

Am I misunderstanding what maximum cluster size means? Incorrectly formatting my data? Thanks in advance.

Best Answer

The gee function is very unfriendly in this respect. It assumes a new cluster whenever the id variable changes. Quoting from the help file:

The length of id should be the same as the number of observations. Data are assumed to be sorted so that observations on a cluster are contiguous rows for all entities in the formula.

Just sort your data by ID, and your problem should go away.