Consider the following static panel model with two-way fixed effects:
$y_{it}=β z_{it}+ δ_i+δ_t+ϵ_{it}$
I am interested in the estimate of $\beta$. In my sample, $N=600,000$, $T=100$ and $I=6,000$. The model thus contains 6,100 dummy variables to be estimated. $y_{it}$ is a rare outcome variable with around 4,000 non-zero values. I ran models with different transformations: a 0-1 dummy, a simple count (from 0 to 20), a count of log(1+values), or the values scaled over population. (I am aware none of these are perfect.)
Given the large number of dummy variables, I have so far estimated models via OLS. $\hat{\beta}$ is always negative and highly statistically significant. I was asked to estimate this model using estimators for count data (poisson, negative binomial) as well as logit. For logit, Fernandez-Val and Weidner (2016) propose a logit estimator that allows for two-way fixed effects, but this does not converge.
It is often said (and written on blogs) that nonlinear models are not well-suited for panel data. I have not seen a discussion for settings with many multi-dimensional fixed effects. In this case, where the number of cross-sections is much larger than $T$, how should I think about incidental parameter bias? Which other reference is there to tell me what makes sense and what doesn't?
Thanks a lot
Reference:
Fernandez-Val, Ivan and Weidner, Martin, (2016), Individual and time effects in nonlinear panel models with large N, T, Journal of Econometrics, 192, issue 1, p. 291-312.
Best Answer
Here are some resources for estimation of high-dimensional fixed effects model primarily in R. Unfortunately I do not know how well they handle "rare events".
For the incidental parameters problem Berge writes that
However using the R-package bife for binomial logit, Stammann, Heiss and McFadden
Hope you find some of these resources useful.
Linear models
Simen Gaure has developed an R-package called lfe, which theoretically supports any dimensionality of fixed effects. One of the benefits of Simen Gaure’s implementation is the flexibility and speed.
In Stata there is a package called reg2hdfe and reg3hdfe which has been developed by Guimaraes and Portugal (2010). As the name indicates, these support only fixed effects up to two or three dimensions.
The iterative procedure used to fit these model is described in detail in Gaure (2013), but also appears in Guimaraes and Portugal (2010).
See also this block post by Thiemo Fetzer (2014)
Non-linear models
The R-package bife
There is an application example here on Cross Validated in this question bife-example.
The R-package fixest (extends and replaces FENmlm)
The R-package FENmlm
The R-package alpaca
Questions on Cross Validated
feglm (alpaca-package example)
Stata
Correia, Sergio, Paulo Guimaráes , and Tom Zylkin (2019) PPMLHDFE: Fast Poisson Estimation with High-Dimensional Fixed Effects(+)
R packages
Julian Hinz (2020) R_glmhdfe (github)
Laurent Berge (2020) fixest: Fast Fixed-Effects Estimations (CRAN)
Laurent Berge (2019) FENmlm: Fixed Effects Nonlinear Maximum Likelihood Models (CRAN)
Stammann, Amrei (2020) alpaca: Fit GLM's with High-Dimensional k-Way Fixed Effects alpaca: Fit GLM's with High-Dimensional k-Way Fixed Effects (CRAN)
Stammann, Amrei (2020) bife: Binary Choice Models with Fixed Effects (CRAN)
Articles(+), Working Papers(++) or Notes(+++)
Stammann, Heiss, and McFadden (2016) Estimating Fixed Effects Logit Models with LargePanel Data (++)
Stammann, Amrei (2018) Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-way Fixed Effects (++)
Julian Hinz, Alexander Hudlet, and Joschka Wanner (2019) Separating the Wheat from the Chaff: Fast Estimation of GLMs with High-Dimensional Fixed Effects (+++)