Solved – Finding the projection used in multidimensional scaling

dimensionality reductionerrormachine learningMATLABmultidimensional scaling

Background

I have a set of data points in high-dimensional (512D) space that I wish to map to 2D for visualisation. I am interested in observing in 2D the (approximate) relative distances between the data points and their general spatial structure.

Currently I am using multidimensional scaling within Matlab to do this:

% Dissimilarity matrix
D = pdist(X', 'euclidean');
% Non-metric MDS -- force dimensionality to 2D
Y = mdscale(D, 2)';
% Display result
figure, plot(Y(1, :), Y(2, :), 'o');

Where X is the set of high-dimensional data points (512 x number of points).

Having performed this mapping for a set of points, I wish to be able to project/map new high-dimensional points within the 2D space without needing to use the original set of high-dimensional points.

Questions

(1) How do I obtain the mapping/projection used to scale the 512D points to 2D so that I can apply it to new data points?

(2) What would be a suitable "error metric" to evaluate the projections? The purpose of doing this is to compare different dimensionality reduction techniques (e.g. MDS vs LLE).

For example, would a comparison of the distances between respective k nearest neighbours within both spaces be suitable? i.e. sum the squared differences between corresponding neighbour distances within the two spaces for each point and compute the mean.

Best Answer

A toolbox can be a cage. If you are doing something new it can be instructive to write it without the toolbox the first time.

The points in the 512 (or whatever) dimensional space might be able to be mapped to any subset of it. Without knowing something about the data, it could throw away all but 1/512th of the information to map to a 2d space.

So not all spaces are created equal. Some are planes oriented with the axes (stunningly convenient but equally rare) but others are surfaces of spheres, or follow differential equations. Some may have projections to 2d surfaces for which we currently have no formalism, but future developments open them up to approach.

If you do not have a clean way to do the projection, you are going to have to throw away information.

The quick way, as @ttnphns suggested, to determine if you have plane-ish clouds of data, and throw away the non-informative dimensions perpendicular to those planes, is PCA (principle component analysis).

Here are some links:

Best of luck.

Related Question