Generalized Additive Model – Difference Between ‘trans = plogis’ and ‘trans = exp’ on Y-axis in plot(gam)

generalized-additive-modelmgcvrregression

I saw a comment here by @gavin-simpson y-axis values in plot(gam) , I don't understand why trans = exp is used instead of trans = plogis , how do u decide which one to use?

The code I am using

library(mgcv)
b <- gam(outcome ~ s(week, k = 4, fx = TRUE, by = food) + food, data = df1, family = betar(link="logit"), method = "REML")
summary(b)

My dependent variable is in proportion from 0 to 1. Week is a form of timeline measure, and food is a categorical variable.

When I do trans = plogis, I get the plot below

plot(b, pages = 1, trans = plogis, shift = coef(b)[1])

How shall I interpret this plot here with trans = plogis ?

When I do trans = exp, I get the plot below

plot(b, pages = 1, trans = exp, shift = coef(b)[1])

My question is why am I getting values larger than 1 when I do trans = exp ?

I am new to GAM and still learning, any guidance is appreciated.

Best Answer

In the linked question they were talking about negative binomial models, which by default use the $\log$ as the link-function, since it deals with counts and frequencies just like Poisson-models. Beta-models deal with stuff like probabilities which want a link-function that deals with the limits at 0 and 1, the default being $\text{logit}$.

What the default link-function is for your family of models in R can be found with help, e.g.: help(nb), or help(betar). Other link-functions can be specified, but this is advanced.

The inverse link-function is generally assumed to be easy to find. plogis being R for $\text{logit}^{-1}$ is a little weird, but since it rises strictly monotonically from 0 to 1 it works as a CDF and so R implemented it as one. Probit models pull the same trick in the other direction: https://en.wikipedia.org/wiki/Probit_model

Related Solutions

Multivariate Normal GAM – Differences Between Sigma and Covariance Matrix

sig2 is the dispersion parameter $\phi$ in the GLM context. It is set to 1 I believe so as to have no effect in this kind of model, because R has been estimated.

Data Visualization – Interpreting y-axis Values in Plot(gam)

The model is a generalization of the generalized linear model – it's not a true GLM as we have the extra parameter the defines the extra dispersion that the NB has over the Poisson – and the parameters of the model are estimate on the scale of a link function, in this case the log scale:

$$y_i \sim \mathcal{NB}(\mu_i, \boldsymbol{\theta})$$

where

$$g(\mu_i) = \beta_1 + f_1(\mathtt{Distance}_i)$$

where $g()$ is the link function, which in the case of the NB is typically $\log()$. So we have

$$\log(\mu_i) = \beta_1 + f_1(\mathtt{Distance}_i)$$

and

$$\mu_i = \exp(\beta_1 + f_1(\mathtt{Distance}_i))$$

where we've taken the inverse of the log function to get the expected value of the response $\mu_i$.

When you just do plot(), you get the partial effect of $f_1$, and this is centred about 0 due to the sum-to-zero constraint applied to all smooths. When you used shift, you added on $\beta_1$ which gives us the right hand side of

$$\log(\mu_i) = \beta_1 + f_1(\mathtt{Distance}_i)$$

What you're missing is the bit on the left hand side; these values are on the log scale, where negative values are allowed.

The solution then is to apply the inverse of this link function to the values. This is done via the trans argument to plot.gam().

Hence, for such a simple GAM, you can get what you want via:

plot(model, residuals = TRUE, pch=1, cex=1, seWithMean = TRUE,
     shift = coef(model)[1],
     trans = exp)

where exp is the exponential function, the inverse of the log function. In this case, this will then yield the actual predicted values from the model for a range of values over Distance on the response scale.

Best Answer

Related Solutions

Multivariate Normal GAM – Differences Between Sigma and Covariance Matrix

Data Visualization – Interpreting y-axis Values in Plot(gam)

Related Question