Time Series Normalization – How to Normalize Highly Volatile Time Series

normalizationtime series

How should I normalize time series data that shows high volatility in order to be used in an ANN? Let's take the price of Bitcoin for example.

I have the following doubts:

As stated in this paper it is not sufficient to normalize dataset. This makes sense since the Bitcoin price range is from 0.06USD to 19,343USD. I think if data is fed to an ANN as a sliding window then each window has to be normalized.
Methods like Min-Max and Z-Score are good for pattern comparison and also standardize range. A pattern between 1000USD and 1200USD becomes exactly the same as the same pattern between 1000USD and 1400USD so valuable information is lost. Of course it's possible to create an additional feature to represent volatility.

So what is the best way to normalize volatile time series data like the price of Bitcoin?

Best Answer

TLDR

Re-read the linked to paper, it's excellent! I think so anyways, and it seems to have some solid options once you've framed the problem you want solved...

Now I'm not going to leave ya with just that, so let's dissect some of what the paper is going on about while keeping your questions in mind.

The paper discusses when they chose to normalize and re-normalize data on page 2 1.2.3 Arbitrary Query Lengths cannot be Indexed and near the end of that page is pretty clear about there not being a technique for similarity searches of arbitrary lengths when dealing with such large datasets.

... though if I understand what they are searching for in this instance, this should only be a problem if you're feeding in lots of sequences that may or may not be correlated in the same or neighboring time intervals... short aside, DTW (Dynamic Time Warp) pretty groovy stuff there, I'll restate that re-reading the paper is probably a good idea... Which for medical related fields totally makes sense, eg. the EEG example from the paper would be sampling n number of points from each of the subject's brain per t time-period multiplied by the length of the time-slice. In other-words a metric-s-ton of data!

Which begs the question of how big you expect your n dimension to be with a crypto-coin?

Regardless, in part 4.2.1 things get interesting as far as data pre-processing, and they even provide a link at the end [43] to some source code to play with.

As far as a best way for pre-processing, that seems to still be a subject of active research, as well as dependent upon model, and what is being trained for and how. Though it seems like you're on the right track as far as wanting to normalize or standardize though each time-range.

One suggestion on a related question How to represent an unbounded variable as number between 0 and 1 (which at the heart seems to be what your question is about, Bitcoin being an unbounded variable along many measurable dimensions), provides some pseudo-code Using a trainable minmax. A version of which I've partially translated to (partially functioning) Python using the following min-maxing method from How to normalize data between -1 and 1?, for demonstration of some of the issues you'll face just with pre-processing.

$$ x''' = (b-a)\frac{x - \min{x}}{\max{x} - \min{x}} + a $$

#!/usr/bin/env python
from __future__ import division


class Scaler_MinMax(list):
    """
    Scaler_MinMax is a `list` that scales inputted, or `append`ed, or `extend`ed values to be between chosen `scale_`s
    """

    def __init__(self, l, scale_min = -1, scale_max = 1, auto_recalibrate = False):
        self.scale_min = scale_min
        self.scale_max = scale_max
        self.range_min = min(l)
        self.range_max = max(l)
        self.initial_values = l
        self.auto_recalibrate = auto_recalibrate
        super(Scaler_MinMax, self).__init__(self.scale(numbers = l))

    def scale(self, numbers = [], scale_min = None, scale_max = None, range_min = None, range_max = None):
        """
        Returns list of scaled values
        """
        if scale_min is None:
            scale_min = self.scale_min
        if scale_max is None:
            scale_max = self.scale_max

        if range_min is None:
            range_min = self.range_min
        if range_max is None:
            range_max = self.range_max

        return [((x - range_min) / (range_max - range_min)) * (scale_max - scale_min) + scale_min for x in numbers]

    def unscale(self, numbers = [], scale_min = None, scale_max = None, range_min = None, range_max = None):
        """
        Returns list of unscaled values
        """
        if not numbers:
            numbers = self

        if scale_min is None:
            scale_min = self.scale_min
        if scale_max is None:
            scale_max = self.scale_max

        if range_min is None:
            range_min = self.range_min
        if range_max is None:
            range_max = self.range_max

        return [(((y - scale_min) / (scale_max - scale_min)) * (range_max - range_min)) + range_min for y in numbers]

    def re_calibrate(self):
        """
        Re-sets `self.range_min`, `self.range_max`, and `self`
        """
        self.range_min = min(self.initial_values)
        self.range_max = max(self.initial_values)
        super(Scaler_MinMax, self).__init__(self.scale(
            numbers = self.initial_values,
            scale_min = self.scale_min,
            scale_max = self.scale_max,
            range_min = self.range_min,
            range_max = self.range_max))

    """
    Supered class overrides
    """

    def __add__(self, other):
        if not isinstance(other, list):
            raise TypeError("can only concatenate list (not '{0}') to list".format(type(other)))

        return super(Scaler_MinMax, self).__add__(self.scale(numbers = other))

    def append(self, value):
        """
        Appends to `self.initial_values` and `self.append(self.scale([value])[0])`
        """
        self.initial_values.append(value)
        if isinstance(value, list):
            super(Scaler_MinMax, self).append([self.scale(numbers = [i])[0] for i in value])
        else:
            super(Scaler_MinMax, self).append(self.scale(numbers = [value])[0])

        if self.auto_recalibrate is True:
            if self.range_max != max(self.initial_values) or self.range_min != min(self.initial_values):
                self.re_calibrate()

    def clear(self):
        """
        Clears `self.initial_values` and `self` of __ALL__ values
        """
        self.initial_values.clear()
        super(Scaler_MinMax, self).clear()

    def extend(self, iterable):
        """
        Extends `self.initial_values` and `self.extend(self.scale(iterable))`
        """
        self.initial_values.extend(iterable)
        super(Scaler_MinMax, self).extend(self.scale(iterable))

        if self.auto_recalibrate is True:
            if self.range_max != max(self.initial_values) or self.range_min != min(self.initial_values):
                self.re_calibrate()

    def insert(self, index, value):
        """
        Inserts `value` into `self.initial_values` and `self.scale([value])` into `self`
        """
        self.initial_values.insert(index, value)
        super(Scaler_MinMax, self).insert(index, self.scale([value]))

        if self.auto_recalibrate is True:
            if self.range_max != max(self.initial_values) or self.range_min != min(self.initial_values):
                self.re_calibrate()

    def pop(self, index):
        """
        Returns tuple of `pop`ed `index`s values from `self.initial_values`, and `self`
        """
        output = self.initial_values.pop(index), super(Scaler_MinMax, self).pop(index)

        if self.auto_recalibrate is True:
            if self.range_max != max(self.initial_values) or self.range_min != min(self.initial_values):
                self.re_calibrate()

        return output

    def remove(self, value):
        """
        Removes value from `self.initial_values` and `self`

        > use of `index` calls to ensure paired values are `pop`ed/`remove`d does increase the expense of this method
        """
        if value in self.initial_values:
            i = self.initial_values.index(value)
            self.pop(i)
            self.initial_values.remove(value)
        elif value in self:
            i = self.index(value)
            self.initial_values.pop(i)
            super(Scaler_MinMax, self).remove(value)
        else:
            raise ValueError("{0}.remove(x) not in list".format(self.__class__.__name__))

        if self.auto_recalibrate is True:
            if self.range_max != max(self.initial_values) or self.range_min != min(self.initial_values):
                self.re_calibrate()

    def reverse(self):
        """
        Reverses `self.initial_values` and `self` __IN PLACE__
        """
        self.initial_values.reverse()
        super(Scaler_MinMax, self).reverse()

    def sort(self):
        """
        Sorts `self.initial_values` and `self` __IN PLACE__
        """
        self.initial_values.sort()
        super(Scaler_MinMax, self).sort()

Example usage and limitations

s = Scaler_MinMax([x for x in range(9)])
s
#  -> [-1.0, -0.75, -0.5, -0.25, 0.0, 0.25, 0.5, 0.75, 1.0]

s.unscale()
#  -> [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]

s.initial_values
#  -> [0, 1, 2, 3, 4, 5, 6, 7, 8]

s.append(20)
s
#  -> [-1.0, -0.75, -0.5, -0.25, 0.0, 0.25, 0.5, 0.75, 1.0, 4.0]

s.re_calibrate()
s
#  -> [-1.0, -0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.30000000000000004, -0.19999999999999996, 1.0]

s.unscale()     # Floating point errors!
#  -> [0.0, 0.9999999999999998, 1.9999999999999996, 3.0000000000000004, 4.0, 5.0, 6.0, 7.0, 8.0, 20.0]

s.initial_values
#  -> [0, 1, 2, 3, 4, 5, 6, 7, 8, 20]

As shown above, floating point errors are why initial_values are saved and used for re_calibrate() if this is not done and unscale() used instead for re_calibrate() floating point errors would compound! Furthermore, not only are there many places the code can be improved, having an extra copy of the un-parsed data along side the parsed data can more than double memory usage.

Floating point errors adding up are also discussed in the linked to paper as well as their methods of mitigation on page 4.2.1 Early Abandoning Z-Normalization near the end where they suggest a "flush out" of such errors every million sub-sequences.

Side note, so that everyone's on the same page...

$$ \mu = \frac{1}{m} \sum {x_i} $$

... in python looks sorta like...

#!/usr/bin/env python
from __future__ import division

# Thanks be to https://stackoverflow.com/a/31136897/2632107
try:
    # Python 2
    range = xrange
except NameError:
    # Python 3
    pass


def calc_mu(m_denominator=2, x_list=[1.5, 42], start=0, end=2, step=1):
  ''' μ = 1/m * ∑x_i '''
  chosen = []
  for i in range(start, end, step):
    chosen.append(x_list[i])
  return float(1) / m_denominator * sum(chosen)

... well that is if I understood An example of subscript and summation notation, I've had to brush-up on that and dabble with the Khan Academy's Sequences and Series in order to get a fuller grasp of what the SIDKDD trillion paper was mathing about.

Hopefully having some of the math alongside code that preforms similar functions demystifies how you too can start trying out methods of data pre-processing. Because it looks like there'll be lots of trial-n-error to find the best pre-processing methods no-matter what it is that one it trying to force feed to a neural network.

I know big bummer that I've not a complete answer, but if there was a publicly available best method of predicting y for time series forecasting there'd likely be trolling of the markets instead of just flash-crashes from bots ...which, hint, hint, maybe a good idea to train a model to predict ;-)

As stated at the top, I think you'll need to frame the problems you want solved very precisely, for example you could try to predict the relative change in price and not the price directly, in which case percentiles may work okay, kinda like using a similar method as was shown in the paper for genetic information encoding to time series.

Notes about code for future editors

I did not add a threshold that was suggested by terrace in the Scaler_MinMax code, to keep things a bit simpler, furthermore, I did not write the code examples with the intent to be efficient or used in production but to be accessible, please keep any edits to corrections and in the spirit of being accessible for as many readers as possible.

Otherwise I think it'll be far more helpful for readers/editors to attempt to write their own code examples in another language &/or using a different method for squashing data, that and it'd cool to see what everyone else comes up with.

Best Answer

TLDR

Example usage and limitations

Notes about code for future editors

Related Solutions

Time Series – Identifying Different Periods of Variance in Time Series Data

Solved – Why and when do we normalize time series data

Related Question