Solved – Can you extrapolate values of the dependent variable with a GAM

extrapolationgeneralized linear modelgeneralized-additive-modelmethod-comparison

I'm trying to find issues where GLMs are better than GAMs and came to the idea that GLMs can make predictions beyond the scope of the data used to feed the model (i.e, extrapolations), while GAMs cannot:

Suppose we have a set of X and Y observations. The X observations are spread inside the domain [x0, x1]. If we fit a GLM to X vs Y we obtain a mathematical relation between X and Y (in the most simple case, Y = b0*X + b1). Therefore, we can obtain for every X_i of our choice a modelled Y_i. We surely should have a good estimate if X_i is inside [x0,x1] but nothing speaks about giving a try also for values outside this range (another story is that the estimate is "good").

Now, GAMs are based on smooth functions obtained from the X-Y scatter, but they give no (simple) mathematical relation between X and Y. You get an Y estimate for each X observation you have and can make a nice plot. Surely you can interpolate any Y value between observations to obtain an estimate of your choice, but considering we have only X data inside the range [x0, x1] you cannot predict (or extrapolate) a Y value with a GAM for an X value lying outside the range [x0, x1]. With no mathematical relation linking X and Y, you cannot extrapolate!

So, if I understand correctly and the answer to my question is "no", I would say the extrapolation or predicting potential of a GLM is surely a very strong advantage in comparison to a GAM!

Best Answer

Four years have passed by and now I'm able of answering my own question. This is indeed a no, you cannot really extrapolate data with a GAM, only in a very limited range quite close to x0 or x1. If you are using splines, you would be extrapolating with a cubic polynomial, which is not very good, since the curve would quickly tend to -infinity or +infinity. There is an example in Simon Wood's book "Generalized Additive Models: An Introduction with R" (exercise 5 in page 400) about this, where it is shown that the extrapolation capacity of a GAM is very limited. I believe indeed that GLMs should be better to extrapolate data.