Solved – Should you normalize your training data for Local Outlier Factor

localoutlierfactornormalizationoutliers

Say I'm using scikits implementation of Local Outlier Factor with euclidean distance being used by the reachability function. My input features are magnitudes apart, so is it advisable that I normalize my input features before training the function even though there might be several separate clusters.

Best Answer

That would be the recommended approach since Local Outlier Factors are based on Nearest Neighbor approach which is a similarity based algorithm. Normalization is recommended for most cases where similarity measures are used, unless you'd want high magnitude features to dominate the distance calculation. If you have a mixture of continuous, ordinal and binary variables in there, you may want to evaluate a few different scales for them.

Related Question