Solved – difference between summary.gam and gam.check p-values

generalized-additive-modelmgcvp-value

I have been working with GAM from the MGCV package for the past few month and I'm pretty happy with the results. But I recently I bumped into something that I can't explain.

I'm developing a GAM based on about 324'000 observations of 4 covariates and I would like to test which interactions between the covariates are relevant. The summary of my GAM is:

> summary.gam(myGam)

Family: gaussian 
Link function: identity 

Formula:
Cm ~ ti(aoa, bs = c("bs"), k = 10) + ti(ct, bs = c("bs"), k = 10) + 
    ti(de, bs = c("bs"), k = 10) + ti(aoa, ct, bs = c("bs", "bs"), 
    k = 7, sp = c(10, 10)) + ti(aoa, de, bs = c("bs", "bs"), 
    k = 7, sp = c(100, 100)) + ti(ct, de, bs = c("bs", "bs"), 
    k = 7, sp = c(100, 100)) + ti(qsb, bs = ("bs"), k = 10) + 
    ti(qsb, ct, bs = c("bs", "bs"), k = 7, sp = c(10, 10)) + 
    ti(qsb, de, bs = c("bs", "bs"), k = 7, sp = c(10, 10))

Parametric coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.691e-02  3.299e-05   512.6   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
               edf Ref.df        F p-value    
ti(aoa)     8.0882  8.428 205981.5  <2e-16 ***
ti(ct)      9.0000  9.000 196630.1  <2e-16 ***
ti(de)      8.1496  8.538 204375.2  <2e-16 ***
ti(aoa,ct) 11.3378 13.331   5572.5  <2e-16 ***
ti(aoa,de)  0.8469 33.000    111.6  <2e-16 ***
ti(ct,de)   2.8943 33.000    126.1  <2e-16 ***
ti(qsb)     8.9892  8.997   6280.7  <2e-16 ***
ti(qsb,ct)  9.3088 31.000    156.0  <2e-16 ***
ti(qsb,de)  7.7581 31.000    379.8  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.984   Deviance explained = 98.4%
GCV = 7.4397e-06  Scale est. = 7.4382e-06  n = 324424

So far, everything seems to be fine. I also called gam.check and get the following result:

> gam.check(myGam)

Method: GCV   Optimizer: magic
Smoothing parameter selection converged after 4 iterations.
The RMS GCV score gradient at convergence was 1.270451e-10 .
The Hessian was positive definite.
Model rank =  201 / 201 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

               k'    edf k-index p-value    
ti(aoa)     9.000  8.088    0.80  <2e-16 ***
ti(ct)      9.000  9.000    0.78  <2e-16 ***
ti(de)      9.000  8.150    0.91  <2e-16 ***
ti(aoa,ct) 36.000 11.338    0.49  <2e-16 ***
ti(aoa,de) 33.000  0.847    0.42  <2e-16 ***
ti(ct,de)  33.000  2.894    0.53  <2e-16 ***
ti(qsb)     9.000  8.989    1.00    0.40    
ti(qsb,ct) 31.000  9.309    0.87  <2e-16 ***
ti(qsb,de) 31.000  7.758    0.99    0.28    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I know that I have an issue with my knots and that my basis dimensions are probably too small. But what is really puzzling me is that the p-values are not the same for summary.gam and gam.check. I probably misunderstood something, but I expected both p-values to be the same. Does anybody have an idea why they are not?

Best Answer

The p values relate to two entirely different tests:

  1. in summary.gam the p values are of the null hypothesis of a zero effect of the indicated spline. There values relate to the F statistic in the table produced by summary.gam,
  2. in gam.check the p values are for the test of the null hypothesis that the basis dimension used is of sufficient size. I.e. these p values relate to the value labelled k-index in the table produced by gam.check
Related Question