Background: For a project, I am fitting a conditional logit model where I have 5 control cases for every realized case. To do that I use the clogit()
function in the package survival
. I wanted to graph interactions with the effects
package by John Fox et al. It turns out that this package can't handle clogit
objects (output of clogit()
).
As I believed I remembered that conditional logit were a special case of GLM, I thought the clever/lazy way to get my interaction plots would be to refit the model using a fixed effects glm and then use effect()
.
The documentation of clogit
seemed to confirm my intuition:
It turns out that the logliklihood for a conditional logistic regresson model = loglik from a Cox model with a particular data structure. […]
When a well tested Cox model routine is available many packages use this ‘trick’ rather than writing a new software routine from scratch, and this is what the clogit routine does.In detail, a stratified Cox model with each case/control group assigned to its own stratum, time set to a constant, status of 1=case 0=control, and using the exact partial likelihood has the same likelihood formula as a conditional logistic regression. The clogit routine creates the necessary dummy variable of times (all 1) and the strata, then calls coxph.
Based on this description, it seems that I should be able to reproduce the stratification achieved through strata()
by using a random intercept for each case/control group with 1|group
in lmer()
. However, when I try, the results of clogit
and lmer
differ. One thing is that I probably have the wrong likelihood function. I don't really know how to specify this in lmer
but more important, I am wondering what else I am missing.
I wonder whether I am completely wrong or somewhat on the right track but missing some pieces? What I would like is to understand what are the difference in terms of how the model is fitted between a conditional logit and a regular one (I understand that might be quite a long answer, so a book reference would be a great start). The my usual references for regression (Gelman and Hill, 2007; Mills 2011) are somewhat silent on the subject.
Best Answer
Your reference says that
clogit
is a special form of Cox regression, not the GLMM. So you are probably mixing things up.The conditional logit log-likelihood is (reverse engineering the LaTeX code from the Stata manual): conditional on $\sum_{j=1}^{n_i} y_{ij} = k_{1i}$, $$ {\rm Pr}\Bigl[(y_{i1},\ldots,y_{i{n_i}})|\sum_{j=1}^{n_i} y_{ij} = k_{1i}\Bigr] = \frac{\exp(\sum_{j=1}^{n_i} y_{ij} x_{ij}'\beta)}{\sum_{{\bf d}_i\in S_i}\exp(\sum_{j=1}^{n_i} y_{ij} x_{ij}'\beta)} $$ where $S_i$ is a set of all possible combinations of $n_i$ binary outcomes, with $k_{1i}$ ones and remaining zeroes, so the summation index-vector has components $d_{ij}$ that are 0/1 with $\sum_{i=1}^{n_i} d_{ij} = k_{1i}$. That's a pretty weird likelihood to me. Denoting the denominator as $f_i(n_i,k_{1i})$, the conditional log-likelihood is $$ \ln L = \sum_{i=1}^n \biggl[ \sum_{j=1}^{n_i} y_{ij} x_{ij}'\beta - \ln f_i(n_i, k_{1i}) \biggr] $$ This likelihood can be computed exactly, although the computational time goes up steeply as $p^2 \sum_{i=1}^n n_i \min(k_{1i}, n_i - k_{1i})$ where $p={\rm dim}\, \beta = {\rm dim}\, x_{ij}$. This is the likelihood that should be identical to the stratified Cox regression, which I won't try to entertain here.
The mixed model likelihood (again, adopting from Stata manuals) is based on integrating out the random effects: $$ {\rm Pr}(y_{i1}, \ldots, y_{1{n_i}} |x_{i1}, \ldots, x_{i{n_i}})=\int_{-\infty}^{+\infty} \frac{\exp(-\nu_i^2/2\sigma_\nu^2)}{\sigma_\nu \sqrt{2\pi}} \prod_{i=1}^{n_i}F(y_{ij}, x_{ij}'\beta + \nu_i) $$ where $ F(y,z) = \Bigl\{ 1+\exp\bigl[ (-1)^y z \bigr] \Bigr\}^{-1} $ is a witty way to write down the logistic contribution for the outcome $y=0,1$. This likelihood cannot be computed exactly, and in practice is approximated numerically using a set of Gaussian quadrature points with abscissas $a_m$ and weights $w_m$ resembling the density of the standard normal density on a grid, producing (in the simplest version) $$ \ln L \approx \sum_{i=1}^n \ln\biggl[ \sqrt{2} \sum_{m=1}^M w_m \frac{1}{\sigma_\nu \sqrt{2\pi}} \prod_{i=1}^{n_i}F(y_{ij}, x_{ij}'\beta + \sqrt{2} \sigma_\nu a_m) \biggr] $$ (The $\exp(\nu_i^2)$-like terms disappear due to the full quadrature formula, but since it is designed for the physicist'
erf()
function rather than statisticians' $\Phi()$ function, it works with $\exp(-z^2)$ rather than $\exp(-z^2/2)$; hence the weird $\sqrt{2}$ in a couple of places.) Computational time for $\ln L$ itself is proportional to $nM$, but since you need to take the second order derivatives for Newton-Raphson, feel free to multiply by $p^2$. Smarter computational schemes aka adaptive Gaussian quadratures try to find a better location and scale parameters for the quadrature to make the approximation more accurate.In fact, that latter Stata manual describes the differences between the GLMM (aka random effect
xtlogit
, in econometric slang) and conditional logit (aka fixed effectxtlogit
), and might be worth a more serious reading.