Solved – How to exploit periodicity to reduce noise of a signal

arimaautocorrelationkalman filterrtime series

100 periods have been collected from a 3 dimensional periodic signal. The wavelength slightly varies. The noise of the wavelength follows Gaussian distribution with zero mean. A good estimate of the wavelength is known, that is not an issue here. The noise of the amplitude may not be Gaussian and may be contaminated with outliers.

How can I compute a single period that approximates 'best' all of the collected 100 periods?

I have no idea how time-series models work. Are they prepared for varying wavelengths? Can they handle non-smooth true signals? If a time-series model is fitted, can I compute a 'best estimate' for a single period? How?

A related question is this. Speed is not an issue in my case. Processing is done off-line, after all periods have been collected.

Origin of the problem: I am measuring acceleration during human steps at 200 Hz. After that I am trying to double integrate the data to get the vertical displacement of the center of gravity. Of course the noise introduces a HUGE error when you integrate twice. I would like to exploit periodicity to reduce this noise. Here is a crude graph of the actual data (y: acceleration in g, x: time in second) of 6 steps corresponding to 3 periods (1 left and 1 right step is a period):
Human steps

My interest is now purely theoretical, as http://jap.physiology.org/content/39/1/174.abstract gives a pretty good recipe what to do. It does not address periodicity.

Note: I have asked this question on stackoverflow but it seems to be off-topic there.

Best Answer

I'm not sure I totally understand what would be best, but I believe that looking at Dynamic Linear Models, State Space modeling, and the Kalman Filter would be helpful. In R, the package dlm is fairly accessible.

Edit: Describe DLM's simply, eh? One explanation I've seen is that DLMs are regression where your coefficients are allowed to change with time. Doesn't really give me much intuition.

So, here is my I-realize-I-don't-understand-DLMs-well-enough-to-do-this-well-but-you-asked answer, which I hope others will correct as necessary. Let me use a situation that's similar to how they were invented...

Say you were controlling a remotely-piloted vehicle. You could summarize the vehicle's actual state (speed, direction, altitude, fuel, etc, etc) as a vector in a state space. You can't directly observe the actual state, but you do have sensors that observe linear combinations of the actual state, with some (gaussian) noise added in. That is, your sensors are not perfect.

Further, the vehicle has a program that determines how it changes states, as a linear combination of the current state, also with some (gaussian) noise. That is, your controls are not perfect.

Say you want to know the most likely and most accurate path possible for the vehicle -- that is, the states it actually went through: how it actually moved. What you observe is noisy, but you can use a Kalman filter on this system you model and the filter models the variances in the system and outputs the most likely actual states. At each time step, it forecasts what it believes the state will be, then observes what the sensors say, calculates the variances of the different parts, and comes up with a final estimate of the state as a weighted sum of the forecast and the sensor readings. Then the process is repeated for the next time step.

The whole concept is basically a continuous Hidden Markov Model, if you're familiar with those. The model is several equations which involve several matrices that are multiplied together, and you can add columns to the matrix to reflect trend, seasonal, and other types of components of the time series.

The R package dlm makes it particularly easy to define and combine components.