I'm using the gee()
function from the gee
package in R. The problem I'm having is that the 'Maximum cluster size' that I get from the output of the GEE function seems to disagree with what I believe it should be given my data.
Here's a small example, where I have six observations from each of ten patients, which are members of one of three groups:
ID <- rep(1:10, 6)
myData <- data.frame(ID)
myData$ID <- factor(myData$ID)
myData$Group <- c(rep('A', 20), rep('B', 20), rep('C', 20))
myData$value <- rnorm(60, mean = 10, sd = 1)
Calling summary(myData$ID)
shows that I do, in fact, have six observations of each ID, and calling class(myData$ID)
shows that ID
is, in fact, a factor. Therefore, I would expect that the maximum cluster size would be six. However, when I call the following…
gee(value ~ Group, id = ID, data = myData)
…the Maximum cluster size
that prints out is 1
.
Am I misunderstanding what maximum cluster size means? Incorrectly formatting my data? Thanks in advance.
Best Answer
The
gee
function is very unfriendly in this respect. It assumes a new cluster whenever the id variable changes. Quoting from the help file:Just sort your data by
ID
, and your problem should go away.