Solved – clogit in R: original variable or demeaned

clogitfixed-effects-modellogisticrsplines

Conditional logistic regression is a fixed effects model. If you're modeling the dependent variable $y$, a glm fixed effect model doesn't actually model $y$. Instead, the glm fixed effect models measure $y-mean(y)$ for a particular group. I think that this is not the case for a conditional logistic regression. The coefficients of the regression can be interpreted in the space of $y$. Is that correct?

My particular situation:
I am running a conditional logit with clogit in R, from the survival package. Are the coefficients returned to be interpreted in the space of $y$, or in the space of something like $y-mean(y)$?

Normally the difference isn't very relevant; one would interpret the coefficient roughly the same either way. However, in my case one of the independent variables is fitted as a spline. Specifically, it is a restricted cubic spline, as calculated from rcspline.eval in the Hmisc package. clogit produces a coefficient for each knot of the spline, and in order to interpret the overall effect of the variable one needs to reconstruct the spline from the coefficients (using rcspline.restate). I want to make sure that I should be looking at the shape of this spline in the range of $y$ (which in my case is 0-100) or in the range of something like $y-mean(y)$ (in this case, $mean(y)$ is the same for all groups: 50). If it is the case that the space is shifted this will be particularly weird for a spline, because presumably the knots should also be shifted somehow.

Best Answer

As nicely explained in this document:

The exponentiated conditional logistic regression coefficients have the same odds-ratio interpretation as ordinary logistic estimates.

Conditional logistic regression differs from ordinary logistic regression in that the data are divided into groups and, within each group, the observed probability of positive outcome is either predetermined due to the data construction (such as matched case–control) or in part determined because of unobserved differences across the groups. Thus, the likelihood of the data depends on the conditional probabilities—the probability of the observed pattern of positive and negative responses within group conditional on that number of positive outcomes being observed. Terms that have a constant within-group effect on the unconditional probabilities—such as intercepts and variables that do not vary—cancel in the formation of these conditional probabilities and so remain unestimated.

In other words, the difference between conditional logit and regular logit regressions comes from how you estimate the probability of a positive outcome for a given observation, not from how you interpret odds ratios (i.e. the exponentiated coefficients). In the conditional logit model, the estimated probability of observing y_i=1 for a given observation is conditional to the number of 1s that are observed in a given group.

If you are interested in the equations that show how this conditional probability works, a simple starting point is the Stata reference manual for clogit.

Related Question