Solved – Dimensionality reduction using self-organizing map

clusteringdimensionality reductionmultivariate analysisneural networksself organizing maps

Self-organizing maps are claimed to be an approach for dimensionality reduction. However, I am kind of confused about this claim.

Consider the following example, I have a data set with 200 data points and each data point is represented by a feature vector with 1000 dimensions. Assume I would like to train a map with a $1 \times 2$ grid. In other words, I will map these 200 points to two cells. Just for illustration purposes, suppose the first 100 points are mapped onto the first cell and the latter 100 points are mapped onto the second cell. From the viewpoint of dimensionality reduction, we can say we use a 1-dimensional output space to represent the original 1000-dimensional space. But this really confuses me a lot. According to this example, it looks to me that the first 100 points will share the same feature vector, which is just 1 in the one-dimensional space; and the latter 100 points will share the same feature vector, which is 2 in the same one-dimensional space. Is my understanding correct?

Best Answer

In your extreme example, I'd say your view is correct. You specified that you wanted a reduction to one dimension with two possible values in that dimension and that's what you got. As Wikipedia says, SOM creates a discretized low-dimensional representation.

Perhaps the issue is how SOM does this. Let's say you specified a 3x3 SOM, which is a 2-D grid with 9 points. The SOM algorithm embeds this 2-D grid in your 1000-D space, as Neil G points out. It then adapts the 9 points to the data in such a way as to maintain the manifold's smoothness in 1000-D space, while moving the grid points to denser (in terms of your data) areas. (It does this by propagating changes to closest SOM points in the 2-D manifold, not to neighbors in the 1000-D space.)

So, each of the 9 points in your SOM grid has a 1000-D location in 1000-D space, but after the algorithm is finished, you are considering the 9 points in the 3x3 grid itself, reducing 1000-D space to (discretized) 2-D space.

You could also look at this as a kind of clustering of your data around 9 points that are constrained in relationship to each other.

Related Question