Solved – the statistical justification of interpolation

estimationinterpolation

Suppose that we have two points (the following figure: black circles) and we want to find a value for a third point between them (cross). Indeed we are going to estimate it based on our experimental results, the black points. The simplest case is to draw a line and then find the value (i.e., linear interpolation). If we had supporting points e.g., as brown points in both sides we prefer to get benefit from them and fit a non-linear curve (green curve).

The question is that what is the statistical reasoning to mark the red cross as the solution? Why other crosses (e.g., yellow ones) are not answers where they could be? What kind of inference or (?) pushes us to accept the red one?

I will develop my original question based on the answers got for this very simple question.

enter image description here

Best Answer

Any form of function fitting, even nonparametric ones (that typically make assumptions on the smoothness of the curve involved), involves assumptions, and thus a leap of faith.

The ancient solution of linear interpolation is one that 'just works' when the data you have is fine-grained 'enough' (if you look at a circle close enough, it looks flat as well - just ask Columbus), and was feasible even before the computer age (which is not the case for many modern day splines solutions). It makes sense to assume the belief that the function will 'continue in the same (i.e. linear) matter' between the two points, but there is no a priori reason for this (barring knowledge about the concepts at hand).

It becomes quickly clear when you have three (or more) noncolinear points (like when you add the brown points above), that linear interpolation between each of them will soon involve sharp corners in each of those, which is typically unwanted. That is where the other options jump in.

However, without further domain knowledge, there is no way to state with certainty that one solution is better than the other (for this, you would have to know what the value of the other points is, defeating the purpose of fitting the function in the first place).

On the bright side, and maybe more relevant to your question, under 'regularity conditions' (read: assumptions: if we know that the function is e.g. smooth), both linear interpolation and the other popular solutions can be proven to be 'reasonable' approximations. Still: it requires assumptions, and for these, we typically do not have statistics.

Related Question