Solved – Nonlinear vs. generalized linear model: How to refer to logistic, Poisson, etc. regression

generalized linear modellink-functionlogisticnonlinearpoisson-regression

I have a question about semantics that I would like fellow statisticians' opinions on.

We know models such as logistic, Poisson, etc. fall under the umbrella of generalized linear models. The model includes nonlinear functions of the parameters, which may in turn be modeled using the linear model framework by using the appropriate link function.

I'm wondering if you consider (teach?) situations such as logistic regression as a:

  1. Nonlinear model, given the form of the parameters
  2. Linear model, since the link transforms us to the linear model framework
  3. Simultaneously (1) and (2): It "starts" as a nonlinear model, but may be worked with in such a way that allows us to think of it as a linear model

Wish I could set-up an actual poll…

Best Answer

This is a great question.

We know models such as logistic, Poisson, etc. fall under the umbrella of generalized linear models.

Well, yes and no. Given the context of the question, we must be quite careful to specify what we're talking about -- and "logistic" and "Poisson" alone are insufficient to describe what is intended.

(i) "Poisson" is a distribution. As a description of a conditional distribution, it's not linear (and hence not a GLM) unless you specify a linear (in parameters) model to describe the conditional mean (i.e. it's not sufficient merely to say "Poisson"). When people specify "Poisson regression", they nearly always intend to a model that is linear in parameters, and is therefore a GLM. But "Poisson" alone could be any number of things*.

(ii) "Logistic" on the other hand refers to the description of a mean (that the mean is logistic in predictors). It's not a GLM unless you combine it with a conditional distribution that's in the exponential family. When people say "logistic regression" on the other hand, they almost always mean a binomial model with logit link - that does have mean that's logistic in predictors, the model is linear in parameters and is in the exponential family, so is a GLM.

The model includes nonlinear functions of the parameters,

Well, again, yes and no.

The linear in "generalized linear model" says the parameters enter the model linearly. Specifically, what's meant is that on the scale of the linear predictor $\eta=g(\mu)$, the model is of the form $\eta=X\beta$.

which may in turn be modeled using the linear model framework by using the appropriate link function.

Correct

I'm wondering if you consider (teach?) situations such as logistic regression as a:

(I am changing the order of your question here)

Linear model, since the link transforms us to the linear model framework

It's conventional to call a GLM "linear", for precisely this reason. Indeed, it's pretty clear that this is the convention, because it's right there in the name.

Nonlinear model, given the form of the parameters

We must be very careful here, because "nonlinear" generally refers to a model that is nonlinear in parameters. Contrast nonlinear regression with generalized linear models.

So if you want to use the term "nonlinear" to describe a GLM, it's important to carefully specify what you mean - generally, that the mean is non linearly related to the predictors.

Indeed, if you do use "nonlinear" to refer to GLMs, you will get into difficulty not just with convention (and so be likely to be misunderstood), but also when trying to talk about generalized nonlinear models. It's a bit hard to explain the distinction if you already characterized GLMs as "nonlinear models"!

* Consider a Poisson nonlinear regression model, one where there is no $g(\mu)$ for which the parameters enter linearly, so we still have:

$$ Y\sim \text{Poisson}(\mu_x)$$

but for example, where $x$ is age, $Y$ at a given $x$ is observed deaths, and $\mu_x$ is a model for population annual mortality at age $x$:

$$\mu_x = \alpha + \exp(\beta x)\,.$$

(Normally we'd have an offset here for the population at age $x$ which would shift the $\alpha$ term, but we can posit a situation where we observe a constant exposure. Note that both Poisson and binomial models are used for modelling mortality.)

Here the first term represents a constant death rate due to (say) accidents (or other effects not much related to age) while the second term has an increasing death rate due to age. Such a model may perhaps sometimes be feasible over short ranges of later-adult-but-not-senescent-ages; it's essentially Makeham's law (there presented as a hazard function, but for which an annualized rate would be a reasonable approximation).

That's a generalized nonlinear model.