Solved – R/Stata package for zero-truncated negative binomial GEE

count-datapanel datarstatatruncation

this is my first post. I'm truly grateful for this community.

I am trying to analyze longitudinal count data that is zero-truncated (probability that response variable = 0 is 0), and the mean != variance, so a negative binomial distribution was chosen over a poisson.

Functions/commands I've ruled out:

R

  • gee() function in R does not account for zero-truncation nor the negative binomial distribution (not even with the MASS package loaded)
  • glm.nb() in R doesn't allow for different correlation structures
  • vglm() from the VGAM package can make use of the posnegbinomial family, but it has the same problem as Stata's ztnb command (see below) in that I can't refit the models using a non-independent correlation structure.

Stata

  • If the data wasn't longitudinal, I could just use the Stata packages ztnb to run my analysis, BUT that command assumes that my observations are independent.

I've also ruled out GLMM for various methodological/philosophical reasons.

For now, I've settled on Stata's xtgee command (yes, I know that xtnbreg also does the same thing) that takes into account both the nonindependent correlation structures and the neg binomial family, but not the zero-truncation. The added benefit of using xtgee is that I can also calculate qic values (using the qic command) to determine the best fitting correlation structures for my response variables.

If there is a package/command in R or Stata that can take 1) nbinomial family, 2) GEE and 3) zero-truncation into account, I'd be dying to know.

I'd greatly appreciate any ideas you may have. Thank you.

-Casey

Best Answer

For R two options spring to mind, both of which I am only vaguely familiar with at best.

The first is the pscl package, which can fit zero truncated inflated and hurdle models in a very nice, flexible manner. The pscl package suggests the use of the sandwich package which provides "Model-robust standard error estimators for cross-sectional, time series and longitudinal data". So you could fit your count model and then use the sandwich package to estimate an appropriate covariance matrix for the residuals taking into account the longitudinal nature of the data.

The second option might be to look the geepack package which looks like it can do what you want but only for a negative binomial model with known theta, as it will fit any type of GLM that R's glm() function can (so use the family function from MASS).

A third option has raised it's head: gamlss and it's add-on package gamlss.tr. The latter includes a function gen.trun() that can turn any of the distributions supported by gamlss() into a truncated distribution in a flexible way - you can specify left truncated at 0 negative binomial distribution for example. gamlss() itself includes support for random effects which should take care of the longitudinal nature of the data. It isn't immediately clear however if you have to use at least one smooth function of a covariate in the model or can just model everything as linear functions like in a GLM.