Solved – Multidimensional Scaling “eurodist”

distancemultidimensional scalingr

I have a question regarding Multidimensional Scaling. I used the dataset eurodist from the package datasets to generate a 2 dimensional configuration of the distances between European cities. I expected a nearly exact representation of the location for the cities (although the points could be mirrored) because we have distance data here that are known a-priori to be accurate. There shouldn't be any conflicts inside the data, but actually my analysis shows THERE ARE!

Does anyone know the reason why we have stress inside the data?

library("datasets")

data(eurodist)

obj <- cmdscale(eurodist, k = 2)
plot(obj[,1], obj[,2], type = "n")
text(obj[,1], obj[,2], labels = rownames(obj))
sh <- Shepard(eurodist,obj)
plot(sh$x, sh$y, main="Shepard-Diagram")
abline(0, 1) 

Best Answer

(You need library(MASS) in your code it seems.) From ?eurodist:

The data give the road distances (in km) between 21 cities in Europe. The data are taken from a table in The Cambridge Encyclopaedia.

This is in addition to problem (3) mentioned by ttnphns in the comments. Not only are they not flat distances, but they are not distances as the crow flies either. As one example, the outlier at (1662, 713) on the Shepard plot corresponds to the pair (Cologne, Geneva). (It is slightly difficult to find this because the author of Shepard doesn't seem to have bothered to document it.) Looking at the map of Europe, I think this journey has to be made by quite a wiggly route. You can see the outlier by plotting the distances for Cologne only:

plot(as.matrix(eurodist)[6,], as.matrix(dist(obj))[6,])
Related Question