Time-Series – Online Smoothing Algorithm for Sparse Data

algorithmsnormalizationsmoothingtime series

I am working on building a real-time system for processing and aggregating somewhat sparse and irregular survey measurements (ranges from 0-100, usually on the order of 20-100 measurements). I am looking for a way to remove the noise and smooth out this data so our end users can see the overall trend. The main constraint of the problem is that the data needs to be normalized online — past measurements cannot change as a user should be able to reference historical data. Are there any good algorithms to do this? I was thinking about a simple moving average, but it tends to be pretty volatile if an outlier enters/exits the averaging window. Thanks so much!

Best Answer

Try a median filter: for each data point, take a window about that data point of a size you will need to experiment with, and let the new value for that data point be the median of all the points in the window. This is very robust to outliers, since medians are robust to outliers. It also has the virtue that, if your windows always include an odd number of data points, the result consists entirely of actual data points. They may shift about slightly from left to right or vice versa, but never farther than your window size. I think you will find that this filter crunches noise down very effectively.

One note: on the edges, you will need to have lop-sided windows, most likely. That shouldn't be too much of a problem - I'd still recommend having some algorithm ensure that every window has an odd number of data points in it.

One implementation detail is important: when you start replacing data points with medians of data points, don't allow the new median values to influence subsequent windows. Keep calculating medians based on the original data only.

Related Question