Solved – How does normalization reduce dimensionality of data

machine learningnormalizationsvm

While reading a SVM tutorial, the author makes the following statement on normalization technique for processing the input data:

Normalizing data to unit vectors reduces the dimensionality of the data by one since the data is projected to the unit sphere.

I am quite lost on how to understand the dimensionality was reduced by one. Any more explanations will be greatly appreciated.

Best Answer

If you really were to normalize to unit vectors, the author would have a point: imagine every point in the plane to be "replaced" by the point on the unit circle that lies in the same direction from the origin.

Then indeed, the resulting set of points would be (at most) all points on the unit circle, which is of dimension 1 (pick any point on the unit circle as the starting point, now you can uniquely identify any other point on it by one parameter: the distance to it from the starting point along the circle)

However, typically, one doesn't actually project on the unit sphere (which is what I just described for 2D), but in a typical normalization, we simply make it so that the SD/variance is 1 (I'm ignoring the typical translation to make the mean zero here): this is not the same as projection on the unit sphere: it just brings all the data 'closer to' the unit sphere (this is an extremely inaccurate statement, but I could not immediately find one that was better suited and still related somewhat to the projection on the unit sphere idea - if relevant comments follow, I will glad to edit them in).

Disclaimer: I've not read the tutorial in question (you did not provide a link or reference), so maybe I'm answering in the wrong direction here: perhaps in the context of some example, there is a perfectly fine reason to project on the unit sphere. In that case, the explanation of my first two paragraphs should help you somewhat...

Related Question