If I understand correctly, I think you want the coefficients from the $gam
component:
> coef(test$gam)
(Intercept) s(x1).1 s(x1).2 s(x1).3 s(x1).4 s(x1).5
21.8323526 9.2169405 15.7504889 -3.4709907 16.9314057 -19.4909343
s(x1).6 s(x1).7 s(x1).8 s(x1).9 s(x2).1 s(x2).2
1.1124505 -3.3807996 21.7637766 -23.5791595 3.2303904 -3.0366406
s(x2).3 s(x2).4 s(x2).5 s(x2).6 s(x2).7 s(x2).8
-2.0725621 -0.6642467 0.7347857 1.7232242 -0.5078875 -7.7776700
s(x2).9
-12.0056347
Update 1: To get at the basis functions we can use predict(...., type = "lpmatrix")
to get $Xp$ the smoothing matrix:
Xp <- predict(test$gam, type = "lpmatrix")
The fitted spline (e.g. for s(x1)
) can be recovered then using:
plot(Xp[,2:10] %*% coef(test$gam)[2:10], type = "l")
You can plot this ($Xp$) and see that it is similar to um[[1]]$X
layout(matrix(1:2, ncol = 2))
matplot(um[[1]]$X, type = "l")
matplot(Xp[,1:10], type = "l")
layout(1)
I pondered why these are not exactly the same. Is it because the original basis functions have been subject to the penalised regression during fitting???
Update 2: You can make them the same by including the identifiability constraints into your basis functions in um
:
um2 <- smoothCon(s(x1), data=data.frame(x1=x1), knots=NULL,
absorb.cons=TRUE)
layout(matrix(1:2, ncol = 2))
matplot(um2[[1]]$X, type = "l", main = "smoothCon()")
matplot(Xp[,2:10], type = "l", main = "Xp matrix") ##!##
layout(1)
Note I have not got the intercept in the line marked ##!##
.
You ought to be able to get $Xp$ directly from function PredictMat()
, which is documented on same page as smoothCon()
.
You can answer the question for yourself with simple mathematics. If observed $y \ge 0$ and $\hat y$ denotes fitted $y$, then residual $e = y - \hat y$ must be $\ge -\hat y$. The line $e = - \hat y$ is thus a lower limit on your residuals. Despite your unconventional axis choice, it is clear that your data follow suit.
The underlying problem is presumably use of a standard linear model on data not suited to such. One way forward is a log-linear or Poisson(-like) model: (fortuitously but fortunately for the OP as a Stata user) there is a Stata-rich explanation in this blog posting. The posting should be of considerable interest to many users of statistics, however.
P.S. A standard residual plot has residuals on the vertical axis and fitted or predicted on the horizontal axis. The choice of axes is not here an arbitrary convention. A horizontal line indicating zero residuals is the natural reference line, as indicating behaviour matching a perfect model. As emphasised often by J.W. Tukey and others, the best references are linear, and the best linear references are horizontal, in the sense of being easiest to think about. In Stata there is a built-in post-estimation rvfplot
for use after regress
.
P.P.S. The graph flags a Stata user. Naturally use of Stata is quite secondary here to the main question.
Best Answer
Well done for looking at the diagnostic plots for your regression. In this case, they have revealed that your model is inappropriate, as @Glen_b says in the comments. Sometimes you can get away with modelling count data with a gaussian "ordinary" regression. But in this case clearly the violations of the standard assumptions are too strong. There are too many actual values at zero where the model predicts negative values; and this is skewing the whole result and hence leaving a lot of structure in the residuals. You need to move to a Poisson distribution glm.
On the second part of your question, for future reference the identify() function is a good way to identify a few points in a plot eg
Another good trick, when you suspect something about those points, is to create a dummy variable for your candidate explanations (eg 1 when the response=0, 0 otherwise) and map that to a colour aesthetic. ggplot2 is a great package to use for this sort of thing.