[Math] Find a sinusoidal regression equation for some given data.

periodic functionsregression

I have some data which is sinusoidal when graphed. I want to find a regression equation for these data of the form: $$f_{(x)}=A\sin \left[\frac{2\pi}{B} (x-C)\right]+D$$

I estimated all the constants $A,B,C$ and $D$ from the graph and found a reasonable sinusoidal regression curve for these data, happy days.

But I am wondering whether or not there is some way of plotting my data in a linear form such that the slope and intercept etc. will give me a more accurate estimate for the constants. Is there such a method? What would be the simplest way to go about it?

I am not referring to any specific data sample but would like to know how to go about this in general.

Here are some data though that we can work with.

$$x=(1,2,3,4,5,6)$$ $$y=(3.42,0.73,0.12,2.16,4.97,5.97)$$

My work: $$A=\frac{5.97-0.12}{2}=2.925$$ $$D=0.12+2.925=3.045$$ $$C=\frac{6+3}{2}=4.5$$ $$B=2(6-3)=6$$ $$f_{(x)}=2.925\sin[\frac{\pi}{3}(x-4.5)]+3.045$$

I could only base my constants on the min and max data available. This method isn't much use if you don't have good extremity data and also if the data don't even cover one full oscillation.

What would be a more accurate way of going about this?

Best Answer

Assuming that the data cover a large range and do not contain much noise and using as a model $$f(x)=a\sin \left(\frac{2\pi}{b} (x-c)\right)+d$$ we can get some estimates using $$f(0)=d-a \sin \left(\frac{2 \pi c}{b}\right)$$ which gives an estimate of $d$.

For the point $x_1$ where $f(x_1)=0$ we have $$x_1=c-\frac{b \sin ^{-1}\left(\frac{d}{a}\right)}{2 \pi }$$ which gives an estimate of $c$.

For the point $x_2$ where $f'(x_2)=0$ we have $$x_2=\frac{b}{4}+c$$ which gives an estimate of $b$ and $f(x_2)=a+d$ gives an estimate of $a$.

So, basically looking at the plot of the data, we have, in principle, at least consistent estimates of all parameters. and we can safely start the full nonlinear regression.

However, this would imply solving a tedious equation for $a$. But using the above, we can reduce in a first step to the fit of a single parameter $a$, parameters $b,c,d$ being expressed as functions of $a$ using the above relations. When the optimum $a$ has been found (this can be done using a plot of the sum of squares computed for a few discret values of $a$ until a minumum value is detected), we can safely start the full nonlinear regression with consistent estimates.

Edit

Thinking more about the problem, I suppose that I should write the model as $$f(x)=a \sin(\alpha x+\beta)+d\qquad \text{with}\qquad \alpha=\frac {2\pi}b\qquad \text{and}\qquad \beta=-\frac {2\pi c}b$$ Expanding the sine $$f(x)=a \cos (\beta ) \sin (\alpha x)+a \sin (\beta ) \cos (\alpha x)+d=A\sin (\alpha x)+B\cos(\alpha x)+d $$ which is linear if $\alpha$ is given a specific value.

For this value of $\alpha$, use a linear regression to get the best values of parameters $A( \alpha)$, $B( \alpha)$, $d( \alpha)$ and compute the sum of square of residuals. You will find, by trial and error, a zone where $SSQ( \alpha)$ is minimum. Fo this value $\alpha_*$, go back $$A(\alpha_*)=a \cos(\beta)\qquad B(\alpha_*)=a \sin(\beta)\qquad \implies\qquad a\qquad \text{and} \qquad\beta$$ which themselves lead to $b$ and $c$.

For illustration purposes, I used the following (synthetic) data set $$\left( \begin{array}{cc} x & f(x) \\ 0.0 & -11.47 \\ 0.5 & -15.62 \\ 1.0 & -18.19 \\ 1.5 & -19.17 \\ 2.0 & -18.02 \\ 2.5 & -15.30 \\ 3.0 & -11.06 \\ 3.5 & -6.34 \\ 4.0 & -1.50 \\ 4.5 & +2.33 \\ 5.0 & +4.92 \\ 5.5 & +5.49 \\ 6.0 & +4.33 \end{array} \right)$$ Below are reported the results of the linear regressions for different values of $\alpha$ $$\left( \begin{array}{cc} \alpha & \\ 0.1 & \{145.375,\{A\to -7.60697,B\to -174.895,d\to 158.722\}\} \\ 0.2 & \{135.708,\{A\to -5.44567,B\to -48.7632,d\to 32.6894\}\} \\ 0.3 & \{119.635,\{A\to -5.52408,B\to -25.2303,d\to 9.34186\}\} \\ 0.4 & \{97.4249,\{A\to -6.2399,B\to -16.7424,d\to 1.16049\}\} \\ 0.5 & \{69.9893,\{A\to -7.29279,B\to -12.459,d\to -2.63686\}\} \\ 0.6 & \{39.7696,\{A\to -8.58334,B\to -9.62791,d\to -4.7105\}\} \\ 0.7 & \{12.4717,\{A\to -10.0273,B\to -7.20268,d\to -5.97143\}\} \\ 0.8 & \{\color{red}{0.0564199},\{A\to -11.4518,B\to -4.6306,d\to -6.79914\}\} \\ 0.9 & \{24.2511,\{A\to -12.4754,B\to -1.59026,d\to -7.37301\}\} \\ 1.0 & \{115.154,\{A\to -12.4273,B\to 1.90681,d\to -7.78404\}\} \end{array} \right)$$

Using $\alpha_*=0.8$, we get $a=12.3526$, $d=−6.79914$ and $\beta=-2.75734$ and then $b=7.85398$ and $c=3.44668$. Using these numbers, the reults of the nonlinear regression are $$\begin{array}{clclclclc} \text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\ a & 12.346 & 0.0234427 & \{12.2919,12.4001\} \\ b & 7.89158 & 0.0148157 & \{7.85741,7.92574\} \\ c & 3.45132 & 0.003368 & \{3.44355,3.45909\} \\ d & -6.77303 & 0.0209294 & \{-6.82129,-6.72477\} \\ \end{array}$$ As you can notice, the final parameters are quite close to the guesses. Comparing data and prediction $$\left( \begin{array}{ccc} x & f(x) & \text{prediction} \\ 0.0 & -11.47 & -11.509 \\ 0.5 & -15.62 & -15.558 \\ 1.0 & -18.19 & -18.234 \\ 1.5 & -19.17 & -19.117 \\ 2.0 & -18.02 & -18.070 \\ 2.5 & -15.30 & -15.255 \\ 3.0 & -11.06 & -11.115 \\ 3.5 & -6.34 & -6.2947 \\ 4.0 & -1.50 & -1.5496 \\ 4.5 & +2.33 & +2.3786 \\ 5.0 & +4.92 & +4.8754 \\ 5.5 & +5.49 & +5.5505 \\ 6.0 & +4.33 & +4.2982 \end{array} \right)$$

Update

Using the data you posted in your edit $$\left( \begin{array}{cc} x & f(x) \\ 1 & 3.42 \\ 2 & 0.73 \\ 3 & 0.12 \\ 4 & 2.16 \\ 5 & 4.97 \\ 6 & 5.97 \end{array} \right)$$ applying the procedure given in my edit, we obtain $$\begin{array}{clclclclc} \text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\ a & 3.00141 & 0.00178404 & \{2.97874,3.02408\} \\ b & 6.28596 & 0.00321329 & \{6.24513,6.32679\} \\ c & 4.28388 & 0.00094226 & \{4.27190,4.29585\} \\ d & 2.99993 & 0.00174274 & \{2.97779,3.02207\} \\ \end{array}$$ Comparing data and prediction $$\left( \begin{array}{ccc} x & f(x) & \text{prediction} \\ 1 & 3.42 &3.42124\\ 2 & 0.73 &0.72783\\ 3 & 0.12 &0.12170\\ 4 & 2.16 &2.15966\\ 5 & 4.97 &4.96954\\ 6 & 5.97 &5.97003 \end{array} \right)$$ Your very simplified procedure leads to very different results.