Solved – Selecting Link Function for Negative Binomial GLM

count-datageneralized linear modellink-functionnegative-binomial-distributionr

I'm trying to model insect abundance data with a variety of vegetation/site related covariates. Because it is count data that is over-dispersed, I've decided to use the negative binomial distribution. At first I was under the misapprehension that that was the link function, but in modeling with glm.nb, I'm prompted to select a link function. However, the options are limited to log, sqrt, and identity.

I can't find a good explanation of 1) why those three are the only possibilities with glm.nb, or 2) how to conceptualize which is most appropriate for my analysis. Using AICctab in R shows the log function is the best fit, though sqrt is almost indistinguishable. But the plots for the identity link look too good to be true (all points fall within the error bars, each treatment group is distinct, etc). But as far as I know, neither of these are scientifically informed ways to make the decision.

Other reading (eg this response) gives me the impression that I should match the properties of the link function to the response distribution and what I know of its properties. But neither the log nor sqrt seems to match what I know about my distribution (can't be negative, only yields integers). But the log function must match the negative binomial somehow, since it's the default link function for glm.nb.

Best Answer

First, you need to understand better what link functions are. Then, maybe look at what others are doing in your field, for instance this paper.

Then, you have count data, and for such data the most natural link function is the log link function. See for example Goodness of fit and which model to choose linear regression or Poisson. So, unless you have very strong reasons otherwise, you should start out with the log link function.