Generalized Linear Model – How to Visually Ascertain the GLM Link

generalized linear modellink-function

On pg. 125 in Agresti's Categorical Data Analysis, it's suggested by a plot of the dependent variable (a count) vs an independent variable (categorized version of continuous width variable) that the relationship between count and width is linear.

It is then said that a Poisson GLM with the identity link may be appropriate. I think you could also use a normal linear model but the idea is to use the assumption that variance increases with mean in the Poisson.

What I started to wonder is if you can look at a graph and decide that the identity link is appropriate, what would a graph that suggests the log link is appropriate look like? Is this visual approach often used, and is it used for the binomial? It would seem hard to do this with multiple independent variables, but I guess you could make a plot for each variable.

Best Answer

Let the link function be so that the expected value $\mu$ satisfies $$\mu = h(\beta_0 + \beta_1 x)$$ for some function $h$.

If $h$ is the identity so that $h(\eta) = \eta$, then the plot of $\mu$ against $x$ will be linear, since $\mu=\beta_0 + \beta_1 x$.

If $h$ is the exponential function so that $h(\eta) = \exp \eta$, then the plot of $\mu$ against $x$ will appear exponential, since $\mu = \exp (\beta_0 + \beta_1 x)$. The curve will grow increasingly rapidly.

This list of possible link functions and their appearances goes on and on. Unfortunately, we do not have access to $\mu$ to make these plots! All we have are noisy realizations via the outcome data. This is why Agresti recommends showing a smoothed plot of the outcome against $x$. The smoothed outcome is a nonparametric estimate of $\mu$ that does not rely on knowing the link function in advance.

This approach works for any distribution, including binomial. Although, I personally would not be able to distinguish between a logistic and probit curve.

This answer is for univariable models (i.e. with one explanatory variable $x$) only. With multivariable models, higher dimensional visualizations or slices would be useful.

Related Question