Use GEE when you're interested in uncovering the population average effect of a covariate vs. the individual specific effect. These two things are only equivalent in linear models, but not in non-linear (e.g. logistic). To see this, take, for example the random effects logistic model of the $j$'th observation of the $i$'th subject, $Y_{ij}$;
$$ \log \left( \frac{p_{ij}}{1-p_{ij}} \right)
= \mu + \eta_{i} $$
where $\eta_{i} \sim N(0,\sigma^{2})$ is a random effect for subject $i$ and $p_{ij} = P(Y_{ij} = 1|\eta_{i})$.
If you used a random effects model on these data, then you would get an estimate of $\mu$ that accounts for the fact that a mean zero normally distributed perturbation was applied to each individual, making it individual specific.
If you used GEE on these data, you would estimate the population average log odds. In this case that would be
$$ \nu = \log \left( \frac{ E_{\eta} \left( \frac{1}{1 + e^{-\mu-\eta_{i}}} \right)}{
1-E_{\eta} \left( \frac{1}{1 + e^{-\mu-\eta_{i}}} \right)} \right) $$
$\nu \neq \mu$, in general. For example, if $\mu = 1$ and $\sigma^{2} = 1$, then $\nu \approx .83$. Although the random effects have mean zero on the transformed (or linked) scale, their effect is not mean zero on the original scale of the data. Try simulating some
data from a mixed effects logistic regression model and comparing the population level average with the inverse-logit of the intercept and you will see that they are not equal, as in this example. This difference in the interpretation of the coefficients is the fundamental difference between GEE and random effects models.
Edit: In general, a mixed effects model with no predictors can be written as
$$ \psi \big( E(Y_{ij}|\eta_{i}) \big) = \mu + \eta_{i} $$
where $\psi$ is a link function. Whenever
$$ \psi \Big( E_{\eta} \Big( \psi^{-1} \big( E(Y_{ij}|\eta_{i}) \big) \Big) \Big) \neq E_{\eta} \big( E(Y_{ij}|\eta_{i}) \big) $$
there will be a difference between the population average coefficients (GEE) and the individual specific coefficients (random effects models). That is, the averages change by transforming the data, integrating out the random effects on the transformed scale, and then transformating back. Note that in the linear model, (that is, $\psi(x) = x$), the equality does hold, so they are equivalent.
Edit 2: It is also worth noting that the "robust" sandwich-type standard errors produced by a GEE model provide valid asymptotic confidence intervals (e.g. they actually cover 95% of the time) even if the correlation structure specified in the model is not correct.
Edit 3: If your interest is in understanding the association structure in the data, the GEE estimates of associations are notoriously inefficient (and sometimes inconsistent). I've seen a reference for this but can't place it right now.
Best Answer
I think there could be some confusion caused by those links. I believe the statement about "not for nonlinear models" is actually referring to generalised linear mixed models (GLMMs), for example when the response is binary or a count or generally whenever a non-gaussian link function is used; and not a nonlinear mixed model, such as those that can be fitted with
nlme
like the logistic growth model $f(x)={\frac {L}{1+e^{-k(x-x_{0})}}}$ where we would no longer have a linear predictor. GLMMs still have a linear predictor, but a lot of the literature on GLMMs talks about them being nonlinear models due to the link function, but not the functional form of the model iteslf. This inevitably can lead to some confusion.So, usually the debate about GEE vs mixed models is actually about GEE vs GLMM.
GLMMs typically produce estimates that are conditional on the random effects, whereas GEEs average over the random effects to produce marginal estimates. The fundamental difference between the two is in this interpretation of the (fixed) effects. GEEs produce population-averaged effects, while GLMMs produce subject specific effects.
So there is indeed an argument for the use of GEE rather than GLMM when the marginal (population averaged) interpretation is wanted. GEEs are also useful when the correlation structure is mispecified, as the standard errors are robust. On the other hand GEE is know to require larger sample sizes and is not robust to data missing at random whereas GLMMs generally are. Finally, the
GLMMAdaptive
package in R can produced marginal as well as conditional estimates.