Normalization – How to Normalize Accelerometer Sensor Data Effectively

normalizationsignal processing

I'm working with a large set of accelerometer data collected with multiple sensors worn by many subjects. Unfortunately, nobody here seems to know the technical specifications of the devices and I don't think they have ever been recalibrated. I don't have a whole lot of info about the devices. I'm working on my master's thesis, the accelerometers were borrowed from another university and altogether the situation was a bit intransparent. So, preprocessing on-board the device? No clue.

What I do know is that they are triaxial accelerometers with a 20Hz sampling rate; digital and presumably MEMS. I'm interested in nonverbal behavior and gesturing, which according to my sources should mostly produce activity in the 0.3-3.5Hz range.

Normalizing the data seems quite necessary, but I'm not sure what to use. A very large portion of the data is close to the rest values (raw values of ~1000, from gravity), but there are some extremes like up to 8000 in some logs, or even 29000 in others. See the image below. I think this makes it a bad idea to divide by the max or stdev to normalize.

What's the usual approach in a case like this? Divide by the median? A percentile value? Something else?

As a side issue, I'm also not sure if I should clip the extreme values..

Thanks for any advice!

Edit: Here is a plot of about 16 minutes of data (20000 samples), to give you an idea of how the data is typically distributed.

enter image description here

Best Answer

The raw signals you show above appear to be unfiltered and uncalibrated. Appropriate filtering and calibration, with some artifact rejection will in effect normalize the data. The standard approach with accelerometer data is the following:

  1. Filter - e.g. 4th order, zero-phase IIR lowpass or bandpass filter
  2. Artifact rejection - threshold based
  3. Calibrate - Ferraris et al method (Procedure for effortless in-field calibration of three-axis rate gyros and accelerometers, F Ferraris, U Grimaldi, M Parvis - Sensors and Actuators, 1995) method works well for this.

It is advisable to perform artifact rejection on inertial sensor data. I would be concerned that you don't know the provenance of the data, and so you cannot guarantee that the sensors were affixed correctly and consistently (in terms of orientation and physical placement) to all subjects. If the sensors were not affixed correctly, you can get a lot of artifact in the signals, as the sensor can move relative to the body-segment. Similarly, if the sensors were orientated differently (in how they were placed) on different subjects, the data will be difficult to compare across subjects.

Given the size of the outliers you report they seem likely to be artifacts. Such artifacts would almost certain skew any calibration calculation (though their effect will be lessened by appropriate filtering) and so calibration should be performed after artifact rejection.

A simple threshold may work well for an initial artifact rejection routine, i.e. remove (or replace with NaN) all samples above a certain empirical threshold. More sophisticated techniques will adaptively calculate this threshold using a running mean or moving window.

Depending on the location of the sensor you may also wish to correct for the influence of gravity on the acceleration signals, though detailed understanding on sensor axes and positioning is crucial here. The Moe-Nillson method (R. Moe-Nilssen, A new method for evaluating motor control in gait under real-life environmental conditions. Part 1: The instrument, Clinical Biomechanics, Volume 13, Issues 4–5, June–July 1998, Pages 320-327) is the most commonly used and works well for lower back mounted inertial sensors.

A good place to start on examining the data for gesture recognition would be to break the filtered, calibrated data into epochs (e.g. 10s) and calculate a number of features per epoch and relate these to the labels you have for the data, I can't offer more specific advice without knowing more about the data set and the associated labels.

Hope this helps.