Logistic Regression – Is Mundlak Fixed Effects Procedure Applicable for Logistic Regression with Dummies?

categorical datafixed-effects-modellogisticstata

I have a dataset with 8000 clusters and 4 million observations. Unfortunately my statistical software, Stata, runs rather slowly when using its panel data function for logistic regression: xtlogit, even with a 10% subsample.

However, when using the nonpanel logit function results appear much sooner. Therefore I may be able to benefit from using logit on modified data that accounts for fixed effects.

I believe this procedure is coined the "Mundlak fixed effects procedure" (Mundlak, Y. 1978. Pooling of Time-Series and Cross-Section Data. Econometrica, 46(1), 69-85.)

I found an intuitive explanation of this procedure in a paper by Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making causal claims: A review and recommendations. The Leadership Quarterly, 21(6). 1086-1120. I quote:

One way to get around the problem of omitted fixed effects and to still
include Level 2 variables is to include the cluster means of all Level
1 covariates in the estimated model (Mundlak, 1978). The cluster means
can be included as regressors or subtracted (i.e., cluster-mean
centering) from the Level 1 covariate. The cluster means are invariant
within cluster (and vary between clusters) and allow for consistent
estimation of Level 1 parameters just as if fixed-effects had been
included (see Rabe-Hesketh & Skrondal, 2008).

Therefore cluster-mean centering seems ideal and practical for solving my computational problem. However, these papers seem to be geared towards linear regression (OLS).

Is this method of cluster-mean centering also applicable for "replicating" fixed effects binary logistic regression?

A more technical question that should result in the same answer would be: is xtlogit depvar indepvars, fe with dataset A equal to logit depvar indepvars with dataset B when dataset B is the cluster-mean centered version of dataset A?

An added difficulty I found in this cluster-mean centering is how to cope with dummies. Because dummies are either 0 or 1, are they identical in random and fixed effects regression? Should they not be "centered"?

Best Answer

First differencing or within transformations like demeaning are not available in models like logit because in the case of nonlinear models such tricks do not remove the unobserved fixed effects. Even if you had a smaller data set in which it was feasible to include N-1 individual dummies to estimate the fixed effects directly, this would lead to biased estimates unless the time dimension of your data is large. Elimination of the fixed effects in panel logit therefore follows neither differencing nor demeaning and is only possible due to the logit functional form. If you are interested in the details you could have a look at these notes by Söderbom on PDF page 30 (explanation for why demeaning/first differencing in logit/probit doesn't help) and page 42 (introduction of the panel logit estimator).

Another problem is that xtlogit and panel logit models in general do not estimate the fixed effects directly which are needed to calculate marginal effects. Without those it will be very awkward to interpret your coefficients which might be disappointing after having run the model for hours and hours.

With such a large data set and the previously mentioned conceptional difficulties of FE panel logit I would stick with the linear probability model. I hope this answer does not disappoint you but there are many good reasons for giving such advice: the LPM is much faster, the coefficients can be interpreted straight away (this holds in particular if you have interaction effects in your model because the interpretation of their coefficients in non-linear models changes!), the fixed effects are easily controlled for and you can adjust the standard errors for autocorrelation and clusters without estimation times increasing beyond reason. I hope this helps.