Solved – QIC or QICu for variable selection in GEE-GLM variable selection

generalized-estimating-equationsmodel selection

I'm fitting a binomial GEE glm to predict presence or absence of sperm whales as a function of environmental variables. I am using latitude, longitude, depth, and distance to land, and year as a factor as my variables. I'm also including year interactions with each of the environmental variables such that:
presence~ lat * year + long * year + depth * year+distance * year

Additionally, I'm trying different forms (linear or splined) for each of the continuous variables. For variable and variable form selection , I have been using Pan's (2001) QICu, which is what I've seen used in most papers.

However, this method is retaining all interactions that I put in the model. When I tried using the regular QIC to select the best model, some interactions and some main effects were dropped. Additionally, both final models have very similar performances abilities (measured by the % of correct presences and absences predicted). Also, when I plot the best model as indicated by QICu, the confidence intervals are huge! This is not the case when I plot results for the model selected through QIC.
This made me think that the QICu was acting strange with my model.

When I was reading about this index, it states that "QICu approximates QIC when the GEE is correctly specified."

Specifically, I would like to know if it is valid to use the QIC (instead of the QICu) for variable selection? And is there a way of knowing whether my GEE is correctly specified?

please let me know if there is any way in which I can clarify the question

Best Answer

Long Story Short, it is not recommended to use QICu, for model selection. However, with estimation of correlation structure, QICu could be used. STILL, it RECOMMENDED to use the QIC, in any case!

Please look at this article https://doi.org/10.1080/03610910701539617

(page 994 line 2): "Furthermore, QICu cannot be used to select the best fitting correlation structure."

Please read the whole thing if you can, i found it helpful.

I did not understand your second question. What does it mean by correctly specified ? Are you referring the model or the working correlation structure or GEE VS other Mixed Models?

Best.

Related Question