Solved – R-squared equivalent for Generalized Estimating Equations (GEE) using a ordinal logistic regression model

effect-sizegeneralized-estimating-equationsordered-logitr-squared

Is there a measure that shows how well GEE using a ordinal logistic regression model explains the amount of variance in the data?

Best Answer

Five years late, but yes (kind of). Zheng proposed two $R^2$ analogues for GEE in 2000 (citation at bottom of answer).

Option 1

For your ordinal logistic model, assume that there is an underlying continuous latent variable that, when thresholds are applied, results in your observed ordinal $Y$. (Also assume that your software allows you to access that latent variable.)

Run a second GEE predicting that latent variable with the same predictors used in the ordinal model. From there, you can use Zheng's marginal $R^2$:

$R_{marginal}^2 = 1- \frac{\sum_{c=1}^C \sum_{i=1}^N (Y_{ic} - \widehat {Y_{it}})^2} {\sum_{c=1}^C \sum_{i=1}^N (Y_{ic} - \bar Y)^2} $

where the numerator is the sum of the squares of the Y (your latent variable) minus the fitted values from this second GEE across each cluster ( $c_1, c_2, ... c_C$ ) and each observation ($i_1, i_2, ... i_N$ ), and the denominator is the sum of the squares of the Y (your latent variable) minus the marginal mean of that Y.

Option 2

Ignore the ordered nature of your outcome variable and use Zheng's $H_{marginal}$ as a measure of "proportional reduction in entropy due to the model" where your model becomes a multinomial logistic model. $H_{marginal}$ is defined as

$H_{marginal} = 1 - \frac{\sum_{c=1}^C \sum_{i=1}^N \sum_{k=1}^K \hat \pi_{cik} log(\hat \pi_{cik}) } { nT\sum_{k=1}^K \hat \alpha_k log(\hat \alpha_k) } $

where $ \pi_{ck} = P( Y_c = k | X) $ is the "model-based probability that a categorical response [in cluster $c$] equals $k$", $\alpha_k = P(Y = k)$ is "the marginal probability of response $k$", and hats (^) indicate estimates.



Note that for both $R_{marginal}$ and $H_{marginal}$, you can obtain a "negative value when there is greater uncertainty in prediction under the model of than under the null model".


Zheng, B. (2000). Summarizing the goodness of fit of generalized linear models for longitudinal data. Statistics in Medicine, 19(10), 1265-1275. doi: 10.1002/(SICI)1097-0258(20000530)19:10<1265::AID-SIM486>3.0.CO;2-U

Related Question