Generalized Additive Models – How to Interpret Threshold Value Not Intersecting with Smooth in Ordered GAM

generalized-additive-modelinterpretationordinal-data

My model consists of an ordered factor with four levels as the response variable and a continuous predictor.

mod <- gam(lev ~ s(pred), family = ocat(R = 4))

The smooth of the predictor looks like this:

enter image description here

The estimated threshold values are:

> mod$family$getTheta(TRUE)
[1] -1.00000000  0.09669527  1.27291819

However, the last value (1.27) does not intersect with the smooth. How can that be interpreted?


Edit to include reproducible example:


(taken from https://github.com/eric-pedersen/mgcv-esa-workshop/blob/master/example-forest-health.Rmd)

library(mgcv)

forest <- read.table(url("https://raw.githubusercontent.com/eric-pedersen/mgcv-esa-workshop/master/data/forest-health/beech.raw"),
                     header = TRUE)

forest <- transform(forest, id = factor(formatC(id, width = 2, flag = "0")))

## Aggregate defoliation & convert categorical vars to factors
levs <- c("low","med","high")
forest <- transform(forest,
                    aggDefol = as.numeric(cut(defol, breaks = c(-1,10,45,101),
                                              labels = levs)),
                    watermoisture = factor(watermoisture),
                    alkali = factor(alkali),
                    humus = cut(humus, breaks = c(-0.5, 0.5, 1.5, 2.5, 3.5),
                                labels = 1:4),
                    type = factor(type),
                    fert = factor(fert))
forest <- droplevels(na.omit(forest))

ctrl <- gam.control(nthreads = 3)
forest.m1 <- gam(aggDefol ~ s(age),
                 data = forest, 
                 family = ocat(R = 3), 
                 method = "REML",
                 control = ctrl)
summary(forest.m1)

plot.gam(forest.m1, shift = coef(forest.m1)[1])

The plot has been shifted by the intercept of -1.72, theta values are -1 and 2.38, so the latter value is far above the range of the smooth.

enter image description here

Best Answer

It doesn't have to intersect with the smooth - you're not including the model constant term when trying to interpret the result.

There is an underlying latent variable stretching from $-\infty$ to $+\infty$ and the cut points $\theta_r$ ($r \in {1,\dots,R}$) mark the transitions from one category to the next where there are $R+1$ categories. The value of the linear predictor $\hat{\eta}_i$ for an observation, yields a value (point) on this latent variable. Together, the $\hat{\eta}_i$ and $\theta_r$ determine the estimated probability that the $i$th observation falls in each category.

The linear predictor in this model is of the form

$$ \eta_i = \alpha + f(x_i) $$

where $\alpha$ is the model constant term (the thing labelled (Intercept) in the output from summary() coef() etc.)

To reconcile your observation that the partial effect of the smooth term $f(x_i)$ never gets above +1, we might conclude that the constant term $\alpha$ is a positive value sufficient to produce values of the linear predictor that are greater than 1.27 to allow the model to predict the $R+1$th category.

The y axis here is not the latent variable or the value of the linear predictor; it's the partial effect of the smooth $f(\mathbf{x)}$, which has been subjected to an sum-to-zero identifiable constraint, which has the effect of centring the smooth about the model constant term. All you can really conclude from the plot of the partial effect of $f(\mathbf{x})$ is that as you increase the value of $x_i$, the probability that you'll fall into one of the higher categories increases, for $x_i \approx 50$, and between $50 \lessapprox x_i \lessapprox 100$ the effect of increasing $x_i$ is also to increase this probability but the effect on $Y$ for a unit change in $x_i$ is lessened and for $x_i \gtrapprox 100$ there is little effect of $x$ on the response. But this all needs to be recentred about the estimated value for $\alpha$ if you want to show the linear predictor (and that is only possible visually here as your model has a single smooth).

Notes on the reproducible example

The apparent contradiction (or confusion) in the Forest example from our course, is due to age not being a great predictor of aggDefol, and especially it cannot distinguish very well at all categories 2 and 3:

library("ggplot2")
library("gratia")
library("tibble")

ggplot(forest, aes(x = age, y = aggDefol)) + 
  geom_point()

enter image description here

If you look at the estimated probability of category conditional upon age, we also see the issue:

ds <- data_slice(forest.m1, age = evenly(age))
fv <- fitted_values(forest.m1, data = ds)

fv |>
  ggplot(aes(x = age, y = fitted, colour = category, group = category)) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = category, colour = NULL),
              alpha = 0.2) +
  geom_line()

enter image description here

And this is reflected in the cut point for the 3rd category being outside the range of the values on the latent variable that are predictable given the smooth of age (the model)

p <- ds |>
  add_column(latent = predict(forest.m1, newdata = ds))

p |>
  ggplot(aes(x = age, y = latent)) +
  geom_line() +
  geom_hline(data = tibble(cuts = gratia:::theta(forest.m1)),
             aes(yintercept = cuts))

enter image description here

Related Question