Solved – Is conditional logit a specific form of GLM? And what are its specificities

clogitgeneralized linear modellogisticrregression

Background: For a project, I am fitting a conditional logit model where I have 5 control cases for every realized case. To do that I use the clogit() function in the package survival. I wanted to graph interactions with the effects package by John Fox et al. It turns out that this package can't handle clogit objects (output of clogit()).

As I believed I remembered that conditional logit were a special case of GLM, I thought the clever/lazy way to get my interaction plots would be to refit the model using a fixed effects glm and then use effect().
The documentation of clogit seemed to confirm my intuition:

It turns out that the logliklihood for a conditional logistic regresson model = loglik from a Cox model with a particular data structure. […]
When a well tested Cox model routine is available many packages use this ‘trick’ rather than writing a new software routine from scratch, and this is what the clogit routine does.

In detail, a stratified Cox model with each case/control group assigned to its own stratum, time set to a constant, status of 1=case 0=control, and using the exact partial likelihood has the same likelihood formula as a conditional logistic regression. The clogit routine creates the necessary dummy variable of times (all 1) and the strata, then calls coxph.

Based on this description, it seems that I should be able to reproduce the stratification achieved through strata() by using a random intercept for each case/control group with 1|group in lmer(). However, when I try, the results of clogit and lmer differ. One thing is that I probably have the wrong likelihood function. I don't really know how to specify this in lmer but more important, I am wondering what else I am missing.

I wonder whether I am completely wrong or somewhat on the right track but missing some pieces? What I would like is to understand what are the difference in terms of how the model is fitted between a conditional logit and a regular one (I understand that might be quite a long answer, so a book reference would be a great start). The my usual references for regression (Gelman and Hill, 2007; Mills 2011) are somewhat silent on the subject.

Best Answer

Your reference says that clogit is a special form of Cox regression, not the GLMM. So you are probably mixing things up.

The conditional logit log-likelihood is (reverse engineering the LaTeX code from the Stata manual): conditional on $\sum_{j=1}^{n_i} y_{ij} = k_{1i}$, $$ {\rm Pr}\Bigl[(y_{i1},\ldots,y_{i{n_i}})|\sum_{j=1}^{n_i} y_{ij} = k_{1i}\Bigr] = \frac{\exp(\sum_{j=1}^{n_i} y_{ij} x_{ij}'\beta)}{\sum_{{\bf d}_i\in S_i}\exp(\sum_{j=1}^{n_i} y_{ij} x_{ij}'\beta)} $$ where $S_i$ is a set of all possible combinations of $n_i$ binary outcomes, with $k_{1i}$ ones and remaining zeroes, so the summation index-vector has components $d_{ij}$ that are 0/1 with $\sum_{i=1}^{n_i} d_{ij} = k_{1i}$. That's a pretty weird likelihood to me. Denoting the denominator as $f_i(n_i,k_{1i})$, the conditional log-likelihood is $$ \ln L = \sum_{i=1}^n \biggl[ \sum_{j=1}^{n_i} y_{ij} x_{ij}'\beta - \ln f_i(n_i, k_{1i}) \biggr] $$ This likelihood can be computed exactly, although the computational time goes up steeply as $p^2 \sum_{i=1}^n n_i \min(k_{1i}, n_i - k_{1i})$ where $p={\rm dim}\, \beta = {\rm dim}\, x_{ij}$. This is the likelihood that should be identical to the stratified Cox regression, which I won't try to entertain here.

The mixed model likelihood (again, adopting from Stata manuals) is based on integrating out the random effects: $$ {\rm Pr}(y_{i1}, \ldots, y_{1{n_i}} |x_{i1}, \ldots, x_{i{n_i}})=\int_{-\infty}^{+\infty} \frac{\exp(-\nu_i^2/2\sigma_\nu^2)}{\sigma_\nu \sqrt{2\pi}} \prod_{i=1}^{n_i}F(y_{ij}, x_{ij}'\beta + \nu_i) $$ where $ F(y,z) = \Bigl\{ 1+\exp\bigl[ (-1)^y z \bigr] \Bigr\}^{-1} $ is a witty way to write down the logistic contribution for the outcome $y=0,1$. This likelihood cannot be computed exactly, and in practice is approximated numerically using a set of Gaussian quadrature points with abscissas $a_m$ and weights $w_m$ resembling the density of the standard normal density on a grid, producing (in the simplest version) $$ \ln L \approx \sum_{i=1}^n \ln\biggl[ \sqrt{2} \sum_{m=1}^M w_m \frac{1}{\sigma_\nu \sqrt{2\pi}} \prod_{i=1}^{n_i}F(y_{ij}, x_{ij}'\beta + \sqrt{2} \sigma_\nu a_m) \biggr] $$ (The $\exp(\nu_i^2)$-like terms disappear due to the full quadrature formula, but since it is designed for the physicist' erf() function rather than statisticians' $\Phi()$ function, it works with $\exp(-z^2)$ rather than $\exp(-z^2/2)$; hence the weird $\sqrt{2}$ in a couple of places.) Computational time for $\ln L$ itself is proportional to $nM$, but since you need to take the second order derivatives for Newton-Raphson, feel free to multiply by $p^2$. Smarter computational schemes aka adaptive Gaussian quadratures try to find a better location and scale parameters for the quadrature to make the approximation more accurate.

In fact, that latter Stata manual describes the differences between the GLMM (aka random effect xtlogit, in econometric slang) and conditional logit (aka fixed effect xtlogit), and might be worth a more serious reading.