Solved – Normalization vs Standardization for multivariate time-series

multivariate analysisnormalizationstandardizationtime series

I'm using Dynamic time warping (DWT) as a distance measure for comparing two multivariate time-series. I want to be able to cluster data using DTW as distance measure, since time-series may be shifted, skewed.

Since there are a couple of parameters I should normalize the series so that all the parameters have the same influence when trying to determine whether time-series are similar. I'm using Euclidean distance as local distance for DTW.

My question is – how to determine whether I should use normalization (subtract min and divide by max) or standardization (subtract mean and divide by standard deviation)?

Moreover, can anyone explain to me what is the point with standardization? I understand that it can help me determine how many standard deviations are values far from their mean, but why would that improve my similarity measure when comparing two time-series?

I'm not a statistician, so any explanation would be great. I understand that normalization would give me values in range [0,1] so that all parameters have values in same range, but what will I get by standardization?

Finally, should I divide each time-series by the standard deviation of the whole dataset, or only by standard deviation of the time-series I'm standardizing?

I must also emphasize that my data does not belong to a normal distribution.

Best Answer

@Chris has a very good answer. There is one important point to make about standardization/normalization, though I am not sure if it 100% applies to the OP case. The point is that you want to be careful about "data leakage" from the future into the past.

I am not sure what you plan to do after applying the DTW, but say you are in a forecasting scenario where you want to predict the future. Also for the sake of example say you have 10 years of data and you want to train your model on the first 9 years and test on the last year. When you standardize, you should only standardize over data from the first 9 years. So if you are using the formula as @Nick Cox suggested of $(value − min) / (max − min)$, then you want to only to get the $max$ and $min$ values over the first 9 years, and not over the entire 10 year range. If normalize the first 9 years of data using information from the first 10 years, then you are leaking some data from the future into the past. So then you will end up with worse prediction performance between your testing set and validation set, or such.

Again, the OP did not explicitly mention this issue, but I thought that other users who visit this post might want to know about the data leakage issue.