MDS – Maximum Number of Dimensions in Multidimensional Scaling

dimensionality reductiondistancemachine learningmultidimensional scalingpca

If I have an arbitrary Euclidean distance matrix $D=(d_{ij}:i=1,\ldots, I; j=1,\ldots, I)$ and I want to reconstruct its elements (pairwise Euclidean distances) via classical Euclidean MDS. That is finding (a possible solution) $x=(x_{ih}:i=1,\dots,I; h=1,\dots,H)$ such that
$$
d_{ij} = \sqrt{\sum_{h=1}^{H} (x_{ih}-x_{jh})^2}.
$$
What is the minimal number $H$ (as a function of $I$) that entails reconstructing exactly any matrix $D$?

My attempt and comments:

The solution $x$ is not unique because we can transform $x$ according to transformations in the Euclidean group (translations, rotations, and reflections) and still have the same distance matrix. However, I am not interested in a unique solution, I am interested in the existence of a solution.
Since $D$ is a distance matrix it is a square symmetric matrix with diagonal elements equal to $0$, it has $I(I-1)/2$ elements that can vary.
Intuitively $H$ must be smaller or equal to $I$, but I am not sure if $H=I$ is the answer since $x$ has $I H$ elements that can be greater or equal than $I(I-1)/2$ also for some $H<I$.

Best Answer

Suppose $\mathbb D = (d_{ij})$ is an $n\times n$ distance matrix: that is, it is the matrix representing all distances for some set of $n$ points $\mathbf x_i$ in some Euclidean space $E^N$ (of arbitrarily high dimension $N$) for which $d_{ij} = |\mathbf x_i - \mathbf x_j|$ for all $1 \le i \le j \le n.$

$n-1$ Suffices

Setting $\mathbf y_j = \mathbf x_j - \mathbf x_n$ for $j=1,2,\ldots, n$ yields the zero vector $\mathbf y_n = \mathbf 0$ together with at most $n-1$ distinct $\mathbf y_i.$ The latter span a Euclidean subspace of $E^N$ of some dimension $d \le n-1,$ which we may identify with $E^d \subseteq E^{n-1} \subseteq E^N.$ This shows that

Given an $n\times n$ distance matrix $\mathbb D = (d_{ij}),$ there always exists a sequence of $n$ points $\mathbf x_i\in E^{n-1}$ for which $|\mathbf x_i - \mathbf x_j| = d_{ij}$ for all $1\le i \le j \le n.$

$n-1$ Is Required

The standard $n-1$-simplex in $\mathbb{R}^n$ has $n$ vertices $\mathbf x_i = (0,\ldots,0,1,0,\ldots,0)$ for $i=1,2,\ldots, n.$ Their mutual squared distances therefore all equal $2,$ implying any three of them form an equilateral triangle. Thus (using elementary 2D geometry) the inner product of any two distinct $\mathbf y_i = \mathbf x_i - \mathbf x_n$ (for $i = 1,2, \ldots, n-1$) is

$$Q(\mathbf y_i, \mathbf y_j) = \mathbf y_i \cdot \mathbf y_j = 1,\ i\ne j.$$

Now suppose that the only thing we know concerning $n$ points $\mathbf{x}_i$ in some Euclidean space $E^N$ is that all their mutual squared distances are $2.$ (Conceivably there are other configurations with this property besides the $n-1$-simplex. The preceding paragraph establishes that such sets of points exist, but it does not demonstrate that $n-1$ dimensions are required to represent them.)

Another way to describe this abstract situation is that the matrix representing the Euclidean inner product in terms of the $\mathbf y_i,$ $i=1,2,\ldots, n-1$ is

$$\mathbb Q_{n-1} = \pmatrix{2 & 1 & 1 & \cdots &1 \\1 & 2 & 1 & \cdots &1\\ \vdots & \vdots & \ddots &\cdots & \vdots\\1 & 1 & \cdots & 2 & 1\\1 & 1 & \cdots & 1 & 2} = \mathbb{I}_{n-1} + \mathbf{1}_{n-1}\mathbf{1}_{n-1}^\prime $$

In general, any such inner product matrix could be degenerate, meaning that fewer than $n-1$ dimensions are needed for the points it describes. However, the form of the right hand side makes it clear that $\mathbf Q_{n-1}$ has an eigenspace generated by $\mathbf{1}_{n-1} = (1,1,\ldots, 1)^\prime$ of eigenvalue $n,$ because $$\mathbb{Q}_{n-1} \mathbf{1}_{n-1} = \mathbf{1}_{n-1} + \mathbf{1}_{n-1}\mathbf{1}_{n-1}^\prime\mathbf{1}_{n-1} = n\mathbf{1}_{n-1},$$ together with a complementary eigenspace with eigenvalue $1$. Since all eigenvalues are nonzero, $\mathbb{Q}_{n-1}$ is nondegenerate, proving the $\mathbf y_i$ span a vector space of dimension $n-1.$ This shows that at least $n-1$ dimensions are needed for an isometric embedding of the $n-1$-simplex, and we have the example we need.

Related Solutions

Distance Matrix – Converting Similarity Matrix to Euclidean Distance

enter image description here

According to cosine theorem, in euclidean space the (euclidean) squared distance between two points (vectors) 1 and 2 is $d_{12}^2 = h_1^2+h_2^2-2h_1h_2\cos\phi$. Squared lengths $h_1^2$ and $h_2^2$ are the sums of squared coordinates of points 1 and 2, respectively (they are the pythagorean hypotenuses). Quantity $h_1h_2\cos\phi$ is called scalar product (= dot product, = inner product) of vectors 1 and 2.

Scalar product is also called an angle-type similarity between 1 and 2, and in Euclidean space it is geometrically the most valid similarity measure, because it is easily converted to the euclidean distance and vice versa (see also here).

Covariance coefficient and Pearson correlation are scalar products. If you center your multivariate data (so that the origin is in the centre of the cloud of points) then $h^2$'s normalized are the variances of the vectors (not of variables X and Y on the pic above), while $\cos\phi$ for centered data is Pearson $r$; so, a scalar product $\sigma_1\sigma_2r_{12}$ is the covariance. [A side note. If you are thinking right now of covariance/correlation as between variables, not data points, you might ask whether it is possible to draw variables to be vectors like on the pic above. Yes, possible, it is called "subject space" way of representation. Cosine theorem remain true irrespective of what are taken as "vectors" on this instance - data points or data features.]

Whenever we have a similarity matrix with 1 on the diagonal - that is, with all $h$'s set to 1, and we believe/expect that the similarity $s$ is a euclidean scalar product, we can convert it to the squared euclidean distance if we need it (for example, for doing such clustering or MDS that requires distances and desirably euclidean ones). For, by what follows from the above cosine theorem formula, $d^2=2(1-s)$ is squared euclidean $d$. You may of course drop factor $2$ if your analysis doesn't need it, and convert by formula $d^2=1-s$. As an example often encountered, these formulas are used to convert Pearson $r$ into euclidean distance. (See also this and the whole thread there questionning some formulas to convert $r$ into a distance.)

Just above I said if "we believe/expect that...". You may check and be sure that the similarity $s$ matrix - the one particular at hand - is geometrically "OK" scalar product matrix if the matrix has no negative eigenvalues. But if it has those, it then means $s$ isn't true scalar products since there is some degree of geometrical non-convergence either in $h$'s or in the $d$'s that "hide" behind the matrix. There exist ways to try to "cure" such a matrix before transforming it into euclidean distances.

Pairwise Mahalanobis Distances: Calculation and Applications

Starting from ahfoss's "succint" solution, I have used the Cholesky decomposition in place of the SVD.

cholMaha <- function(X) {
 dec <- chol( cov(X) )
 tmp <- forwardsolve(t(dec), t(X) )
 dist(t(tmp))
}

It should be faster, because forward-solving a triangular system is faster then dense matrix multiplication with the inverse covariance (see here). Here are the benchmarks with ahfoss's and whuber's solutions in several settings:

 require(microbenchmark)
 set.seed(26565)
 N <- 100
 d <- 10

 X <- matrix(rnorm(N*d), N, d)

 A <- cholMaha( X = X ) 
 A1 <- fastPwMahal(x1 = X, invCovMat = solve(cov(X))) 
 sum(abs(A - A1)) 
 # [1] 5.973666e-12  Ressuring!

   microbenchmark(cholMaha(X),
                  fastPwMahal(x1 = X, invCovMat = solve(cov(X))),
                  mahal(x = X))
Unit: microseconds
expr          min       lq   median       uq      max neval
cholMaha    502.368 508.3750 512.3210 516.8960  542.806   100
fastPwMahal 634.439 640.7235 645.8575 651.3745 1469.112   100
mahal       839.772 850.4580 857.4405 871.0260 1856.032   100

 N <- 10
 d <- 5
 X <- matrix(rnorm(N*d), N, d)

   microbenchmark(cholMaha(X),
                  fastPwMahal(x1 = X, invCovMat = solve(cov(X))),
                  mahal(x = X)
                    )
Unit: microseconds
expr          min       lq    median       uq      max neval
cholMaha    112.235 116.9845 119.114 122.3970  169.924   100
fastPwMahal 195.415 201.5620 205.124 208.3365 1273.486   100
mahal       163.149 169.3650 172.927 175.9650  311.422   100

 N <- 500
 d <- 15
 X <- matrix(rnorm(N*d), N, d)

   microbenchmark(cholMaha(X),
                  fastPwMahal(x1 = X, invCovMat = solve(cov(X))),
                  mahal(x = X)
                    )
Unit: milliseconds
expr          min       lq     median       uq      max neval
cholMaha    14.58551 14.62484 14.74804 14.92414 41.70873   100
fastPwMahal 14.79692 14.91129 14.96545 15.19139 15.84825   100
mahal       12.65825 14.11171 39.43599 40.26598 41.77186   100

 N <- 500
 d <- 5
 X <- matrix(rnorm(N*d), N, d)

   microbenchmark(cholMaha(X),
                  fastPwMahal(x1 = X, invCovMat = solve(cov(X))),
                  mahal(x = X)
                    )
Unit: milliseconds
expr           min        lq      median        uq       max neval
cholMaha     5.007198  5.030110  5.115941  5.257862  6.031427   100
fastPwMahal  5.082696  5.143914  5.245919  5.457050  6.232565   100
mahal        10.312487 12.215657 37.094138 37.986501 40.153222   100

So Cholesky seems to be uniformly faster.