Solved – Using self organizing maps for dimensionality reduction

data transformationdimensionality reductionself organizing maps

Over the past few days, I have been conducting some research on self organizing maps for a project at school. I have come to understand that self organizing maps can be used to reduce the dimensionality of your data. However, I do not understand how this works. For example, say you have a 10×10 network of neurons in a SOM, and your input is 25-dimensional. So, by my understanding, you would create a feature vector for each neuron that is also 25D. By the time training is done, you end up with 100 25D vectors. How is this exactly reducing the dimensions of the data? Am I supposed to be concerned with the location of the neurons?

EDIT: I've already read the question Dimensionality reduction using self-organizing map but I don't feel it answers the question that I have.

Best Answer

The self organising map (SOM) is a space-filling grid that provides a discretised dimensionality reduction of the data.

You start with a high-dimensional space of data points, and an arbitrary grid that sits in that space. The grid can be of any dimension, but is usually smaller than the dimension of your dataset, and is commonly 2D, because that's easy to visualise.

For each datum in your data set, you find the nearest grid point, and "pull" that grid point toward the data set. You also pull each of the neighbouring grid points toward the new position of the first grid point. At the start of the process, you pull lots of the neighbours toward the data point. Later in the process, when your grid is starting to fill the space, you move less neighbours, and this acts as a kind of fine tuning. This process results in a set of points in the data space that fit the shape of the space reasonably well, but can also be treated as a lower-dimension grid.

This is process explained well by two images from page 1468 of Kohonen's 1990 paper:

This image shows a one dimensional map in a uniform distribution in a triangle. The grid starts as a mess in the centre, and is gradually pulled into a curve that fills the triangle reasonably well, given the number of grid points:

One dimensional SOM

The left part of this second image shows a 2D SOM grid closely filling the space defined by the cactus shape on the left:

2D cactus SOM

There is a video of the SOM process using a 2D grid in a 2D space, and in a 3D space on youtube.

Now every one of the original data points in the space has one closest neighbour, to which it is assigned. The grid are thus the centres of clusters of data points. The grid provides the dimensionality reduction.

Here is a comparison of dimensionality reduction using principal component analysis (PCA), from the SOM page on wikipedia:

SOM dimensionality reduction from en.wikipedia.org/wiki/File:SOMsPCA.PNG

It immediately be seen that the one dimensional SOM provides a much better fit to the data, explaining over 93% of the variance, compared to 77% for PCA. However, as far as I am aware, there is no easy way to explain the remaining variance, as there is with PCA (using extra dimensions), since there is no neat way to unwrap the data around the discrete SOM grid.

Related Question