Data Visualization – Interpreting y-axis Values in Plot(gam)

data visualizationgeneralized-additive-modelmgcv

I have used mgcv to fit a simple gam with a negative binomial distribution. My model looks at how animal abundance changes with distance to a feature, and looks like this: gam(Count~s(Distance), data = data, family=nb, method = "ML"), where Count is a variable representing counts of animals, and Distance is a continuous variable describing distance to the feature of interest.

I am following Noam Ross' very helpful tutorial on visualizing gams (https://noamross.github.io/gams-in-r-course/chapter2), but I'm confused by the plot results from my model.

When I use the plot(gam) function to look at my model, after using the shift command to shift the y axis by the model intercept value, I am confused by the scale of the y-axis (output plot below). The code that I used to generate this plot is plot(model, residuals = TRUE, pch=1, cex=1, seWithMean = TRUE, shift = coef(model)[1])

Since my response variable is a count variable, and negative counts are impossible, why am I seeing negative values for my data points on the y-axis (after applying the shift command)? Am I misunderstanding what the y-axis is showing here – i.e. is it not on the scale of the response?

enter image description here

Best Answer

The model is a generalization of the generalized linear model – it's not a true GLM as we have the extra parameter the defines the extra dispersion that the NB has over the Poisson – and the parameters of the model are estimate on the scale of a link function, in this case the log scale:

$$y_i \sim \mathcal{NB}(\mu_i, \boldsymbol{\theta})$$

where

$$g(\mu_i) = \beta_1 + f_1(\mathtt{Distance}_i)$$

where $g()$ is the link function, which in the case of the NB is typically $\log()$. So we have

$$\log(\mu_i) = \beta_1 + f_1(\mathtt{Distance}_i)$$

and

$$\mu_i = \exp(\beta_1 + f_1(\mathtt{Distance}_i))$$

where we've taken the inverse of the log function to get the expected value of the response $\mu_i$.

When you just do plot(), you get the partial effect of $f_1$, and this is centred about 0 due to the sum-to-zero constraint applied to all smooths. When you used shift, you added on $\beta_1$ which gives us the right hand side of

$$\log(\mu_i) = \beta_1 + f_1(\mathtt{Distance}_i)$$

What you're missing is the bit on the left hand side; these values are on the log scale, where negative values are allowed.

The solution then is to apply the inverse of this link function to the values. This is done via the trans argument to plot.gam().

Hence, for such a simple GAM, you can get what you want via:

plot(model, residuals = TRUE, pch=1, cex=1, seWithMean = TRUE,
     shift = coef(model)[1],
     trans = exp)

where exp is the exponential function, the inverse of the log function. In this case, this will then yield the actual predicted values from the model for a range of values over Distance on the response scale.

Related Question