I built a GAM with two categorical variables and two smooth terms, following this structure:
model <- ik ~ population_id_cat + s_status + s(n_locs, bs = "re") + s(animal_id, bs = "re")
The model summary for the parametric coefficients is:
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.90509 0.07798 24.432 < 2e-16 ***
population_id_catB -1.08790 0.13199 -8.242 2.27e-16 ***
population_id_catC 0.06718 0.08880 0.757 0.449368
population_id_catD -0.27599 0.07689 -3.589 0.000336 ***
population_id_catE -0.33914 0.13555 -2.502 0.012391 *
population_id_catF -0.71104 0.07586 -9.373 < 2e-16 ***
population_id_catG -0.39243 0.07276 -5.393 7.32e-08 ***
population_id_catH 0.31115 0.18431 1.688 0.091449 .
population_id_catI 0.06530 0.10203 0.640 0.522174
s_status_2a_m -0.16331 0.05263 -3.103 0.001930 **
s_status_2fam -0.36656 0.05200 -7.050 2.10e-12 ***
When I plot these coefficients with the mgcViz::pterm()
function, the values do not seem to match the std. error (but their significance does match with the p-value provided).
For example, the plot for the parametric coefficients mentioned above:
gviz_model <- getViz(model)
gviz_model_1 <- pterm(gviz_model, 1)
plot(gviz_model_1) + l_ciBar(colour = "blue") + l_fitPoints(colour = "red") + l_rug(alpha = 0.3)
Question 1:
What values are actually plotted in this plot? They seem to match the "Estimate" value from the summary, but they do not match the "Std. Error" (e.g., look at population H, according to the std error, the o value would be significant and therefore would be above 0).
If I plot the Estimate+-Std. Error
in a ggplot
(which is my goal, because I want to plot these parametric values for two models in the same plot), I get this instead:
gviz_model_pop <- termplot(gviz_model, se = TRUE, plot = FALSE)$population_id_cat
ggplot(gviz_model_pop , aes(x=x, y=y)) +
geom_errorbar(aes(ymin=y-se, ymax=y+se), width=.15, size=1, position=position_dodge(0.2)) +
geom_point(position=position_dodge(0.2), size=3.5, shape=16)
Similar trends, but the interval is definitely not the same (and influences the significance of each category).
Question 2:
If I want to plot these parametric coefficients for two models in the same plot, and I would need to use the output from pterm(), instead of extracting the estimates/std error as I did previously, how can I do this? Preferably with a ggplot output.
Question 3:
If I want to plot these parametric coefficients including the intercept (to make the plot more interpretable), as mentioned here in "Transformed standard errors (2)", how do I do it? Do I need separate calculations or is there a function for the parametric coefficients too? Preferably with a ggplot output.
Thank you very much in advance for any help!
Best Answer
Q1
The blue bars are confidence intervals formed (likely) as +/- 2 *
Std. err
Q2
I think this would be easier with
parametric_effects()
from my {gratia} package; whatpterm()
seems to be doing is wrapping up the model object with some extra information, which doesn't seem to be the information computed for the plot.Using the example from
?pterm
, we haveWhich gives
The confidence is easy to add:
I would also append a model column to indicate which model this is:
If you repeat that for the other model, to create
p2
say, you should be able to bind the two objects together withYou can also select which terms you want to extract the info for using the
terms
argument toparametric_effects()
.Q3
The best way to do this is to just predict from the model including only the constant term and parametric term of you choice. In the example from
?pterm
that I used above, there is a factor variablefac
which we'll use to illustrate the point.There's a bug in
data_slice()
currently;data_slice()
seems to be converting logical variables to numeric. We work around that by coaxing them back to logical. And we only need this to keeppredict.gam()
happy; none of the other variables butfac
will be used in the predictions we computed infv
.Having done the above
fv
is:Here, as
fac
only had three levels, we get the three values needed for the plot. Thefitted
column is now has the intercept added to each of the parametric effects, henceA1
, being the reference level has the same value as the intercept:while the other values in
fitted
are(Intercept)
plus the respective coefficient for the other levels. Thelower
andupper
columns contain the confidence interval as it would be plotted in the output from mgcViz. Note thatfitted_values()
returns, by default, predicted values on the response scale. If you have a non-Gaussian family, you can choose to keep the values on the link scale usingscale = "link"
in the call tofitted_values()