Is there a way to rate how close data is to being sinusoidal

statistics

I am looking for a mathematical method of determining how relatively close a data set is, to fitting a generic sine wave (in other words, $\sin(x)$). In my mind, this would be something like an R squared value for a linear regression, except for a sine wave. I would like to be clear that I am not looking for a way to fit data to a sinusoidal function.

So if my data looked something like the following, how close is that to being ($\sin(x)$). Again, I am not looking to do a curve fit for this data. Also, the data was just plucked from Google images as an example of what I am talking about.

Read the comments below for some additional clarification.

enter image description here

Best Answer

When data is this irregular, I first do some fairly simple-minded smoothing such as a moving average. Estimate the location of the peaks. If irregular data causes two peaks to be too close, take their average as the peak location.

I would then look at the distances between successive peaks. If this are approximately equal, this gives an estimate of the period. Call this $p$.

I would then look at the maximum and minimum values - call them $a$ (min) and $b$ (max).

Then, with $h = (b-a)/2$ being an estimate for the amplitude and $c = (b+a)/2$ being an estimate for the center, an initial estimate for the fitting curve is $c+h\cos(2\pi\frac{t-t_0}{p}) $ where $t$ is the time (x-axis) and $t_0$ is the location of the first peak.

Finally, put these parameters ($c, h, t_0, p$) into a nonlinear fitting routine (least squares is probably reasonable and available) as the initial values for fitting that model function to the actual (unsmoothed) data.

Then look at the fit.

All this is moderately ad hoc, but I have done similar things in the past successfully.

Related Question