Solved – How to normalize data in online learning

k nearest neighbournormalizationonline-algorithms

In offline machine learning, the data normalization of features with different units seem to be simple, we can apply this formula.
enter image description here

But, when using incremental learning (weighted kNN in my case) new instances will be added to the initial training set, so do we use the same formula? if yes which max and min should I use (those of the original training set or the new one)?

Best Answer

In an ideal world, our training data should be representative of the production data, which means that the descriptive statistics (such as the mean, max, or min) should not change too much. Thus, in an "online-learning" environment, we should be able to use the max and min value from the historical training data to do the normalization.

If the training data is not representative of the production data, or we do not know how production data is distributed, the answer is 1. collect data; 2. do "training off line;" and then put into production.

Related Question