Solved – Panel model with multi-dimensional fixed effects and rare outcomes – OLS vs. nonlinear estimators

econometricsfixed-effects-modelnonlinear regressionpanel datarare-events

Consider the following static panel model with two-way fixed effects:

$y_{it}=β z_{it}+ δ_i+δ_t+ϵ_{it}$

I am interested in the estimate of $\beta$. In my sample, $N=600,000$, $T=100$ and $I=6,000$. The model thus contains 6,100 dummy variables to be estimated. $y_{it}$ is a rare outcome variable with around 4,000 non-zero values. I ran models with different transformations: a 0-1 dummy, a simple count (from 0 to 20), a count of log(1+values), or the values scaled over population. (I am aware none of these are perfect.)

Given the large number of dummy variables, I have so far estimated models via OLS. $\hat{\beta}$ is always negative and highly statistically significant. I was asked to estimate this model using estimators for count data (poisson, negative binomial) as well as logit. For logit, Fernandez-Val and Weidner (2016) propose a logit estimator that allows for two-way fixed effects, but this does not converge.

It is often said (and written on blogs) that nonlinear models are not well-suited for panel data. I have not seen a discussion for settings with many multi-dimensional fixed effects. In this case, where the number of cross-sections is much larger than $T$, how should I think about incidental parameter bias? Which other reference is there to tell me what makes sense and what doesn't?

Thanks a lot

Reference:

Fernandez-Val, Ivan and Weidner, Martin, (2016), Individual and time effects in nonlinear panel models with large N, T, Journal of Econometrics, 192, issue 1, p. 291-312.

Best Answer

Here are some resources for estimation of high-dimensional fixed effects model primarily in R. Unfortunately I do not know how well they handle "rare events".

For the incidental parameters problem Berge writes that

In particular, the Logit fixed-effect estimator, contrary to the three other likelihoods (Poisson, Negative Binomial, Gaussian), is known to suffer from the incidental parameters problem (Neyman and Scott, 1948; Lancaster, 2000). This problems leads to biased estimators (i.e., that deviate from their “true” value) for short panels.

However using the R-package bife for binomial logit, Stammann, Heiss and McFadden

combine the pseudo-demeaning algorithm with a bias-correction proposed by Hahn and Newey(2004).

Hope you find some of these resources useful.

Linear models

Simen Gaure has developed an R-package called lfe, which theoretically supports any dimensionality of fixed effects. One of the benefits of Simen Gaure’s implementation is the flexibility and speed.

In Stata there is a package called reg2hdfe and reg3hdfe which has been developed by Guimaraes and Portugal (2010). As the name indicates, these support only fixed effects up to two or three dimensions.

The iterative procedure used to fit these model is described in detail in Gaure (2013), but also appears in Guimaraes and Portugal (2010).

See also this block post by Thiemo Fetzer (2014)

Non-linear models

The R-package bife

can be used to fit fixed effects binary choice models (logit and probit) based on an unconditional maximum likelihood approach. It is tailored for the fast estimation of binary choice models with potentially many individual fixed effects. The routine is based on a special pseudo demeaning algorithm derived by Stammann, Heiss, and McFadden (2016). The estimates obtained are identical to the ones of glm, but the computation time of bife is much lower.

There is an application example here on Cross Validated in this question bife-example.

The R-package fixest (extends and replaces FENmlm)

provides a family of functions to perform estimations with multiple fixed-effects. The two main functions are feols for linear models and feglm for generalized linear models. In addition, the function femlm performs direct maximum likelihood estimation, and feNmlm extends the latter to allow the inclusion of non-linear in parameters right-hand-sides. Each of these functions supports any number of fixed-effects and is implemented with full fledged multi-threading in c++. Functions feols and feglm further support variables with varying slopes.

This package is currently (Nov. 2019) the fastest software available to perform fixed-effects estimations (see the project’s homepage for a benchmarking)." Berge (2020) Fast Fixed-Effects Estimation: Short introduction

The R-package FENmlm

The package FENmlm estimates maximum likelihood (ML) models with fixed-effects. The function femlm is the workhorse of the package: it performs efficient ML estimations with any number of fixed-effects and also allows for non-linear in parameters right-hand-sides. Four likelihood models are supported: Poisson, Negative Binomial, Gaussian (equivalent to OLS) and Logit.

The standard-errors of the estimates can be very easily clustered (up to four-way). Berge (2019) Efficient Maximum Likelihood Estimation with Multiple Fixed-Effects

The R-package alpaca

Provides a routine to concentrate out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm proposed by Stammann (2018) and is restricted to glm's that are based on maximum likelihood estimation and non-linear. It also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine and includes robust and multi-way clustered standard errors. Further the package provides analytical bias corrections for binary choice models (logit and probit) derived by Fernandez-Val and Weidner (2016) and Hinz, Stammann, and Wanner (2019).alpaca on CRAN

Questions on Cross Validated

feglm (alpaca-package example)

Stata

Correia, Sergio, Paulo Guimaráes , and Tom Zylkin (2019) PPMLHDFE: Fast Poisson Estimation with High-Dimensional Fixed Effects(+)

ppmlhdfe, a new Stata command for estimation of (pseudo) Poisson regression models with multiple high-dimensional fixed effects (HDFE). Estimation is implemented using a modified version of the iteratively reweighted least-squares (IRLS) algorithm that allows for fast estimation in the presence of HDFE

R packages

Julian Hinz (2020) R_glmhdfe (github)

Laurent Berge (2020) fixest: Fast Fixed-Effects Estimations (CRAN)

Laurent Berge (2019) FENmlm: Fixed Effects Nonlinear Maximum Likelihood Models (CRAN)

Stammann, Amrei (2020) alpaca: Fit GLM's with High-Dimensional k-Way Fixed Effects alpaca: Fit GLM's with High-Dimensional k-Way Fixed Effects (CRAN)

Stammann, Amrei (2020) bife: Binary Choice Models with Fixed Effects (CRAN)

Articles(+), Working Papers(++) or Notes(+++)

Stammann, Heiss, and McFadden (2016) Estimating Fixed Effects Logit Models with LargePanel Data (++)

Stammann, Amrei (2018) Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-way Fixed Effects (++)

Julian Hinz, Alexander Hudlet, and Joschka Wanner (2019) Separating the Wheat from the Chaff: Fast Estimation of GLMs with High-Dimensional Fixed Effects (+++)