[Math] How to perform a monotonic function fitting of data points

functionsinterpolationleast squaresregression

I'm seeking suggestions for general purpose function fitting of a set of data points, where, based on physical intuition, the relationship is expected to be "monotonic", i.e. the function should be strictly increasing (or decreasing) itself as well as its first few derivatives. The points in question are ratios $r=c/a$ of lattice constants $c$ and $a$ of hexagonal Fe-N lattice as a function of volume $x$.

I'm currently just using a third order polynomial, but I don't like the unphysical behavior in rightmost part here (some errors in the values are expected):

First sample fitting

Another sample is this:

Second sample fitting

Due to limited number of points to be fitted, I'd expect no more than 3-4 fitting parameters. The fitting doesn't need to be solvable in closed form as I have to use a non-linear optimizer for other reasons (IPOPT through my own abstraction library FuncLib, in case anyone is interested).

Best Answer

The present answer is based on data $(x_1,y_1),(x_2,y_2),...,(x_k,y_k),...,(x_n,y_n)$ coming from a scan of the graph published in the Morten Bakkedal's question. A graphical scan is not an accurate technic. So, the next results will be less accurate than if data was coming from a numeric table.

In a preliminary inspection, roughly two distinct points alignments can be observed. This draw us to propose to separate the set of points into two sets and perform a linear regression for each one.

Even with a so simplistic method, the result is surprisingly good. See Figure 1. The RMSD $\simeq 0.00115$ is not bad.

enter image description here

In fact, this is the fitting of a piecewise function : $$y(x)=\begin{cases} px+q \qquad\text{if } x\leq X \\ rx+s \qquad\text{if } x>X \end{cases}$$ $(X,Y)$ is the intersection point between the two rays (as defined on Figure 1).

The method of computation of the values of the adjustable parameters $p,q,r,s$ is shown below :

enter image description here

Of course, this supposes an initial guess of $K$ which separates the two sets of points. This is an easy guess just by visual inspection. Moreover, if the guess is not the best, it is easy to check it afterwards : The value of $X$ is computed and must be between $x_K$ and $x_{K+1}$. If not, there is no difficulty to correct the guessed $K$.

An equivalent form of the equation to be fitted is : $$y(x)= (px+q)\text{H}(X-x)+(rx+s)\text{H}(x-X)$$ where H is the Heaviside's step function.

Again this supposes an initial guess, now $X$. Again, this is an easy guess just by visual inspection. Again, if the guess is not the best, it is easy to check it afterwards and correct it if necessary.

One can use an approximate of the Heaviside function. A constant $c$ appears into it, but this is not a parameter to be adjusted. One can set $c=10$ or $20$ or $50$ or more. It doesn't matter. The resulting fit remains the same and the curve on figure $1$ doesn't change.

enter image description here

The fitted function is : $$y(x)= \frac{px+q}{1+e^{c(X-x)}}+\frac{rx+s}{1+e^{c(x-X)}}$$ where $c=10$ for example. Doesn't matter the value of $c$ insofar $c$ is large.

The result of the linear regression for $p,q,r,s$ is the same as above and leads to the same Figure 1.

The only imperfection is that the transition between the two straight lines isn't smooth. The advantage of the above approach is that a smoothness can be obtained very easily in using a worse approximate of the Heaviside function.

This consists in using a smaller value of $c$. Of course, this introduces one adjustable parameter more. But there is no need for an accurate value of $c$. Taking it with order of magnitude around $1$ is sufficient to achieve a smooth transition.

For example, see Figure 3. The fitting is very good. RMSD$\simeq 0.00045$

enter image description here

enter image description here

CASE OF THE SECOND GRAPH published by Morten Bakkedal.

In this case, the range where the function changes of slope isn't reached. So, one linear regression on the form $y(x)=px+q$ is sufficient to provide a signifiant answer.

enter image description here