I have been working with GAM from the MGCV package for the past few month and I'm pretty happy with the results. But I recently I bumped into something that I can't explain.
I'm developing a GAM based on about 324'000 observations of 4 covariates and I would like to test which interactions between the covariates are relevant. The summary of my GAM is:
> summary.gam(myGam)
Family: gaussian
Link function: identity
Formula:
Cm ~ ti(aoa, bs = c("bs"), k = 10) + ti(ct, bs = c("bs"), k = 10) +
ti(de, bs = c("bs"), k = 10) + ti(aoa, ct, bs = c("bs", "bs"),
k = 7, sp = c(10, 10)) + ti(aoa, de, bs = c("bs", "bs"),
k = 7, sp = c(100, 100)) + ti(ct, de, bs = c("bs", "bs"),
k = 7, sp = c(100, 100)) + ti(qsb, bs = ("bs"), k = 10) +
ti(qsb, ct, bs = c("bs", "bs"), k = 7, sp = c(10, 10)) +
ti(qsb, de, bs = c("bs", "bs"), k = 7, sp = c(10, 10))
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.691e-02 3.299e-05 512.6 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
ti(aoa) 8.0882 8.428 205981.5 <2e-16 ***
ti(ct) 9.0000 9.000 196630.1 <2e-16 ***
ti(de) 8.1496 8.538 204375.2 <2e-16 ***
ti(aoa,ct) 11.3378 13.331 5572.5 <2e-16 ***
ti(aoa,de) 0.8469 33.000 111.6 <2e-16 ***
ti(ct,de) 2.8943 33.000 126.1 <2e-16 ***
ti(qsb) 8.9892 8.997 6280.7 <2e-16 ***
ti(qsb,ct) 9.3088 31.000 156.0 <2e-16 ***
ti(qsb,de) 7.7581 31.000 379.8 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.984 Deviance explained = 98.4%
GCV = 7.4397e-06 Scale est. = 7.4382e-06 n = 324424
So far, everything seems to be fine. I also called gam.check
and get the following result:
> gam.check(myGam)
Method: GCV Optimizer: magic
Smoothing parameter selection converged after 4 iterations.
The RMS GCV score gradient at convergence was 1.270451e-10 .
The Hessian was positive definite.
Model rank = 201 / 201
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
ti(aoa) 9.000 8.088 0.80 <2e-16 ***
ti(ct) 9.000 9.000 0.78 <2e-16 ***
ti(de) 9.000 8.150 0.91 <2e-16 ***
ti(aoa,ct) 36.000 11.338 0.49 <2e-16 ***
ti(aoa,de) 33.000 0.847 0.42 <2e-16 ***
ti(ct,de) 33.000 2.894 0.53 <2e-16 ***
ti(qsb) 9.000 8.989 1.00 0.40
ti(qsb,ct) 31.000 9.309 0.87 <2e-16 ***
ti(qsb,de) 31.000 7.758 0.99 0.28
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I know that I have an issue with my knots and that my basis dimensions are probably too small. But what is really puzzling me is that the p-values are not the same for summary.gam
and gam.check
. I probably misunderstood something, but I expected both p-values to be the same. Does anybody have an idea why they are not?
Best Answer
The p values relate to two entirely different tests:
summary.gam
the p values are of the null hypothesis of a zero effect of the indicated spline. There values relate to the F statistic in the table produced bysummary.gam
,gam.check
the p values are for the test of the null hypothesis that the basis dimension used is of sufficient size. I.e. these p values relate to the value labelledk-index
in the table produced bygam.check