Solved – The identity link function does not respect the domain of the Gamma family

gamma distributiongeneralized linear modelpythonstatsmodels

I am using using a gamma generalized linear model (GLM) with an identity link. The independent variable is the compensation of a particular group.

Python's statsmodels summary is giving me a warning about the identity link function ("DomainWarning: The identity link function does not respect the domain of the Gamma family.") that I don't understand and would love some help with. Background: Only basic formal education in statistics and virtually no experience with GLMs beyond logistic regression.

Here's the relevant Python code:

model=statsmodels.genmod.generalized_linear_model.GLM(target,
reducedFeatures, family=sm.families.Gamma(link=sm.families.links.identity))
results=model.fit()
print(results.summary())

Here's the output:
enter image description here

My question is this: In what way does an identity link not respect the domain of the Gamma family? The domain of the gamma family is 0 to infinity? I was also under the impression that the identity link wasn't doing much of anything i.e. it's keeping the independent variables as is and not transforming them/their relationship with the dependent variable. It sounds like a respectful link function 😉

Please correct me

Best Answer

The Gamma GLM model is:

$$ y \mid X \sim \text{Gamma} (\mu = f(X\beta), \phi) $$

Where $\mu$ is the expectation parameter, and $\phi$ is a dispersion parameter (the dispersion parameter is not estimated in the standard GLM framework), $X\beta$ is the linear predictor, $\beta$ are the parameters learned by the model, and $f$ is called the link function.

Note that, while $X\beta$ is allowed to take any real value, $f(X\beta)$ is modeling the expectation of a Gamma distribution, which must be a positive real number. This is what Python is telling you, the identity function is not guaranteed to map $X\beta$ to a positive real number, so does not always result in a valid mean parameter.

Cool. Thank you! All of my independent variables are positive, real numbers so I'm good to go, right?

Not necessarily, one of your estimated coefficients could be negative (your intercept is very negative).

Would you mind going into a bit more detail on what you mean? Why would the sign of the intercept have any impact on the coefficients? That doesn't make sense to me.

It has an effect on the mean parameter of your conditional Gamma distribution. Remember, your structure equation for the model is:

$$ \mu = f(X \beta) $$

and $\mu$ must be positive. Suppose that it is valid that all the values of your predictor variables be zero (I don't know if this is the case in your data, as I lack context for your features). Then, your prediction for this data point would be:

$$ \mu(x) = f \left( (1, 0, 0, \cdots, 0) \cdot \beta \right) = f(\text{Intercept}) $$

If you are using the identity link function this means that

$$ \mu(x) = \text{Intercept} $$

which is an invalid value of $\mu$ when the intercept is negative.

Again, due to contextual constraints of your data, you may avoid this situation, but it is mathematically possible.