Solved – How tonterpret GAM P-Values

mgcvp-value

My name is Hugh, and I'm a PhD student using generalised additive models to
do some exploratory analysis.

I'm not sure how to interpret the p-values that come from the MGCV package
and wanted to check my understanding (I'm using version 1.7-29, and have
consulted some of Simon Wood's documentation). I looked for other CV questions
first, but the most relevant ones seem to be about general regressions, not GAM
p-values in particular.

I know there are lots of different arguments to GAM, and the p-values are only
approximate. But I'm just starting simple to see if there is any "signal" whatsoever
for my covariates. E.g.:

Y ~ s(a, k = 3) + s(b, k = 3) + s(c, k = 3) + s(d, k = 3) + s(e, k = 3)

Approximate p-values of smooth terms:

s(a) = 0.000473
s(b) = 1.13e-05
s(c) = 0.000736
s(d) = 0.887579
s(e) = 0.234017

R² (adjusted) = 0.62$\quad$ Deviance explained = 63.7%
GCV score = 411.17$\quad$ Scale est. = 390.1$\quad$ n = 120

I cut the df columns, etc., due to formatting. I'm interpreting
the p-values for each covariate as a test of whether the corresponding
smooth function significantly reduces model deviance, where p is
the probability of obtaining data at least as 'relatively implausible'
as that observed under a null model of 0.

This would mean that (e.g. with alpha = 0.05) the smoothed functions did
not reduce the deviance for "d" & "e" vs. a null model, whereas they did
for the other terms. Hence (d) and (e) do not add significant information
to the regression, and the deviance explained is down to (a) (b) (c)?

Any advice would be greatly appreciated, and best of luck with your
research.

Best Answer

The paper describing how they work is here.

They are p-values associated with Wald tests that the whole function s(.) = 0. Low p-values indicate low likelihood that the splines that make up the function are jointly zero.

The complicated thing about them is that they involve a reduced-rank pseudoinverse. The typical Wald test is $\hat f (V_\beta)^{-1} \hat f$. You can see immediately that this is a t-test in the univariate case (i.e., not matrices but beta and variance). This gives really low power in the case of penalized splines because those coefficeints are penalized. The rank-r pseudoinverse accounts for this. The paper is really quite dense, but once you get the general gist -- improving the power of a test by accounting for EDF instead of matrix rank -- it becomes possible to follow the formalism.