Solved – What are the theoretical reasons for why extrapolation “less reliable” than interpolation

extrapolationinterpolation

Extrapolation is in general "unreliable". (See "What is wrong with extrapolation?")

But it is also commonly said that extrapolation is "less reliable" than interpolation.

But why should we generally assume that the model is "more reliable" between two known data points than to the right of the right-most data point (or the left of the left-most data point)?

From empirical examples, I can see that indeed, interpolation is often "more reliable" than extrapolation. But is there a more formal, theoretical justification for why this assertion is, in general, true?

Or is it just a purely empirical observation that interpolation tends to be "better"?

Best Answer

It is a theoretical result, at least for linear regression. Indeed, if one computes the so-called ''prediction error'' (see this link, slide 11), one can easily see that the further the independent variable $x$ is away from the sample average $\bar{x}$ (and for extrapolation one may be far away), the larger the prediction error. In the link that I referred to one can also see that in a graphical way.