[Math] Fit a pair of curves to each other

curvesinterpolationoptimizationregression

I may probably use the wrong terms, but I hope you still get the point:

I have 2 curves $c_j, j=1,2$ defined by $(Y_{i,j} , X_{i,j}) , i=1,\ldots,N$ with $X_{i,j} \in [a , b]$ with $X_{i,1} = X_{i,2}$ (but I'd probably like to know for $X_{i,1} \neq X_{i,2}$ as well if there might be different way than interpolation). I want to know the factor in y $q_y$ and the shift in x $d_x$ to make $c_1$ overlap with $c_2$.

To be more specific: I have 2 curves which should overlap within a certain region but don't actually do. I would like to know the factor of x shift and y stretch to correct the whole curves c_1 and c_2 also outside the overlap region.

One way to accomplish this is searching a minimum of squared distance or a covariance of 1 for several shifts and factors applied, but I thought there might be some more analytical way to retrieve the desired factors like some kind of linear regression.

I hope I made it clear and that you might have the simple solution I do not see know.
Thanks

EDIT:

I am asking for translation and scaling. As requested I prepared an example in R. Code is:

library(ggplot2)
set.seed(1234)

y <- function(x0, dx=0, qy=1)
{
  x <- x0 + dx
  (sin(x) - 0.2*x + 10) * qy + rnorm(length(x),sd=0.1)
}

x <- seq(20,30,0.1)
y1 <- y(x)
y2 <- y(x,0.5,0.9)

df <- data.frame(x=rep(x,2),
                 y=c(y1,y2),
                 curve=as.factor(rep(c(1,2),each=length(x)))
                 )

p <- ggplot(df, aes(x=x, y=y, colour=curve))
p <- p + geom_line()
p

ggsave("example.png",p)

In this example I used a list of $x \in [20,30]$ for curves $y_1$ and $y_2$. I do not have the function $y = sin(x) – 0.2 x + 10$. this is just for demonstration.

This creates the following plot (sorry too little reputation for showing here):
plot

So what I have is: $x, y_1$ and $y_2$

what I want to find out:

  • $dx$ … unknown translation in $x$ for $y_2$
  • $q_y$ … unknown scaling from $y_1$ to $y_2$ without the translation $dx$

Best Answer

This is an idea to solve the problem. I didn't test it. So, don't ask me to elaborate.

Notations:

The discretized first curve is defined by the points: $\quad (X_1,Y_1),(X_2,Y_2),...$

These points are supposed to be close to an unknown function $Y(x)$.

The discretized second curve is defined by the points: $\quad (x_1,y_1),(x_2,y_2),...$

These points are supposed to be close to an unknown function $y(x)$.

it is supposed that those functions are related by an $x$ shift and an $y$ stretch so that $$y(x)=a\:Y(x-b)$$ where $a$ is the unknown stretch factor and $b$ is the unknown shift parameter.

Proposed method of calculus :

Consider the Fourier transform of the functions $Y(x)$ and $y(x)$, respectively $$G(\omega)=\mathscr{\LARGE{F}}\big(Y(x);\omega\big)$$ $$g(\omega)=\mathscr{\LARGE{F}}\big(y(x);\omega\big)$$ Numerical calculus of the Fourier transforms will lead to the transform data, respectively : $$(G_1,\omega_1), (G_2,\omega_2), ... \quad\text{ and }\quad (g_1,\omega_1), (g_2,\omega_2), ...$$ where the $G$ and $g$ are complex numbers.

It is known that $$\mathscr{\LARGE{F}}\big(Y(x-b);\omega\big)=e^{i\,b\,\omega}\mathscr{\LARGE{F}}\big(Y(x);\omega\big) $$ The Fourier transform of $y(x)=a\:Y(x-b)$ is : $$g(\omega)=a\,e^{i\,b\,\omega}G(\omega)$$

$$g(\omega)=G(\omega)a\left(\cos(b\omega)+i\,\sin(b\omega)\right)$$ Separating the real and imaginary parts, with the above data, a non-linear regression leads to approximate values of the parameters $a$ and $b$.

IN ADDITION:

I made a few tests of the above method. For example, with the same generative function that Martin used. But the simulated scatter isn't exactly the same because all softwares have not the same random function. So, the curves red and blue on the figure below are not the exact copy of the Martin's graph.

The resulting stretched and translated curve is drawn in black (this is the red cuve transformed to approximately fit to the blue curve).

enter image description here

The computed stretch factor $0.868$ is to be compared to the theoretical $0.9$

The computed shift on x-axis $-0.413$ is to be compared to the theoretical $-0.5$ which isn't very good.

It appears that the result is very sensitive to the level of scatter of the original data. The method seems rather robust for the computation of the stretch factor. But very bad about the x-shift computation.

As a consequence, on practical viewpoint, this method is too sensitive to the scatter of data and not robust enough. I don't advise to use it until serious improvement would be achieved. This probably would require a lot of work.

Related Question