Solved – How to exploit periodicity to reduce noise of a signal

arimaautocorrelationkalman filterrtime series

100 periods have been collected from a 3 dimensional periodic signal. The wavelength slightly varies. The noise of the wavelength follows Gaussian distribution with zero mean. A good estimate of the wavelength is known, that is not an issue here. The noise of the amplitude may not be Gaussian and may be contaminated with outliers.

How can I compute a single period that approximates 'best' all of the collected 100 periods?

I have no idea how time-series models work. Are they prepared for varying wavelengths? Can they handle non-smooth true signals? If a time-series model is fitted, can I compute a 'best estimate' for a single period? How?

A related question is this. Speed is not an issue in my case. Processing is done off-line, after all periods have been collected.

Origin of the problem: I am measuring acceleration during human steps at 200 Hz. After that I am trying to double integrate the data to get the vertical displacement of the center of gravity. Of course the noise introduces a HUGE error when you integrate twice. I would like to exploit periodicity to reduce this noise. Here is a crude graph of the actual data (y: acceleration in g, x: time in second) of 6 steps corresponding to 3 periods (1 left and 1 right step is a period):
Human steps

My interest is now purely theoretical, as http://jap.physiology.org/content/39/1/174.abstract gives a pretty good recipe what to do. It does not address periodicity.

Note: I have asked this question on stackoverflow but it seems to be off-topic there.

Best Answer

I'm not sure I totally understand what would be best, but I believe that looking at Dynamic Linear Models, State Space modeling, and the Kalman Filter would be helpful. In R, the package dlm is fairly accessible.

Edit: Describe DLM's simply, eh? One explanation I've seen is that DLMs are regression where your coefficients are allowed to change with time. Doesn't really give me much intuition.

So, here is my I-realize-I-don't-understand-DLMs-well-enough-to-do-this-well-but-you-asked answer, which I hope others will correct as necessary. Let me use a situation that's similar to how they were invented...

Say you were controlling a remotely-piloted vehicle. You could summarize the vehicle's actual state (speed, direction, altitude, fuel, etc, etc) as a vector in a state space. You can't directly observe the actual state, but you do have sensors that observe linear combinations of the actual state, with some (gaussian) noise added in. That is, your sensors are not perfect.

Further, the vehicle has a program that determines how it changes states, as a linear combination of the current state, also with some (gaussian) noise. That is, your controls are not perfect.

Say you want to know the most likely and most accurate path possible for the vehicle -- that is, the states it actually went through: how it actually moved. What you observe is noisy, but you can use a Kalman filter on this system you model and the filter models the variances in the system and outputs the most likely actual states. At each time step, it forecasts what it believes the state will be, then observes what the sensors say, calculates the variances of the different parts, and comes up with a final estimate of the state as a weighted sum of the forecast and the sensor readings. Then the process is repeated for the next time step.

The whole concept is basically a continuous Hidden Markov Model, if you're familiar with those. The model is several equations which involve several matrices that are multiplied together, and you can add columns to the matrix to reflect trend, seasonal, and other types of components of the time series.

The R package dlm makes it particularly easy to define and combine components.

Related Solutions

Solved – Auto.arima with daily data: how to capture seasonality/periodicity

If there is weekly seasonality, set the seasonal period to 7.

salests <- ts(data,start=2010,frequency=7) 
modArima <- auto.arima(salests)

Note that the selection of seasonal differencing was not very good in auto.arima() until very recently. If you are using v2.xx of the forecast package, set D=1 in the call to auto.arima() to force seasonal differencing. If you are using v3.xx of the forecast package, the automatic selection of D works much better (using an OCSB test instead of a CH test).

Don't try to compare the AIC for models with different levels of differencing. They are not directly comparable. You can only reliably compare the AIC with models having the same orders of differencing.

You don't need to re-fit the model after calling auto.arima(). It will return an Arima object, just as if you had called arima() with the selected model order.

Solved – Characterizing periodicity level of signal

Try using recurrence plot or perform a Fourier transform.

Recurrence plot

In the first case, the periodicity is translated to equally spaced lines with the slope 1 (and $R(t,t'+T) = R(t,t')$). This method is more robust, and easier to visualize, than the one below.

Then you can try to detect lines from the plot. E.g.:

Approach 1: Subtract the diagonal and calculate fractal dimension by box counting. The result, which is in the range of $[0,2]$, is 1 for a periodic signal, but not only for it.
Approach 2: Rotate by $45^\circ$ (or rather - take subimage rotated by $45^\circ$) and perform the Singluar Value Decomposition - then $\lambda_0/\sum_i \lambda_i$ (where $\lambda_0$ is the largest singular value) is an easy (but I guess not that robust) measure of periodicity (closer to one, the more periodic).

There are other methods for detection of lines (e.g. Radon transform) but here they seem to be using a canon to kill a fly.

Fourier transform

In the second case, for a periodic function its Fourier transform $\tilde{f}$ is a sum of equally spaced Dirac delta peaks: $$\tilde{f}(\omega) = \sum_{k=-\infty}^\infty a_k \delta(\omega - k \omega_0).$$ Of course for every practical data it will be smeared. If you want a quantification then $\int |\tilde{f}(\omega)|^4 d\omega$ may work (or use Shannon/Renyi entropy) - the larger it is, the more periodic is the function.

EDIT: Edited considerably to give numbers characterizing periodicity.

Best Answer

Related Solutions

Solved – Auto.arima with daily data: how to capture seasonality/periodicity

Solved – Characterizing periodicity level of signal

Related Question