Solved – Efficient/fast Mahalanobis distance computation

computational-statisticsdistance

Suppose I have $n$ data points $x_1,\dots,x_n$, each of which is $p$-dimensional. Let $\Sigma$ be the (non-singular) population covariance of these samples. With respect to $\Sigma$, what is the most efficient way known to compute the vector of squared Mahalanobis distances (from $\vec 0$) of the n data points.

That is we would like to compute the vector $(x_1^T\Sigma^{-1}x_1,\dots,x_n^T\Sigma^{-1}x_n)$.

Computing the inverse $\Sigma^{-1}$ seems to be quite slow for large matrices. Is there a faster way?

Best Answer

Let $x$ be one of your data points.
Compute the Cholesky decomposition $\Sigma=LL^\top$.
Define $y=L^{-1}x$.
Compute $y$ by forward-substitution in $Ly=x$.
The Mahalanobis distance to the origin is the squared euclidean norm of $y$:

$$ \begin{align} x^\top\Sigma^{-1}x &= x^\top(LL^\top)^{-1}x \\ &= x^\top(L^\top)^{-1}L^{-1}x \\ &= x^\top(L^{-1})^\top L^{-1}x \\ &= (L^{-1}x)^\top(L^{-1}x) \\ &= \|y\|^2. \end{align} $$

Related Solutions

Solved – Can the covariance matrix in Mahalanobis distance definition be zero

If $Q=(q_{ij})$ "were zero", that is if $q_{ij}=0$ for every $i,j$, then, clearly, $\det(Q)=0$. Hence, the inverse $Q^{-1}$ doesn't exist, but the Malahanobis distance is defined in terms of $Q^{-1}$.

Pairwise Mahalanobis Distances: Calculation and Applications

Starting from ahfoss's "succint" solution, I have used the Cholesky decomposition in place of the SVD.

cholMaha <- function(X) {
 dec <- chol( cov(X) )
 tmp <- forwardsolve(t(dec), t(X) )
 dist(t(tmp))
}

It should be faster, because forward-solving a triangular system is faster then dense matrix multiplication with the inverse covariance (see here). Here are the benchmarks with ahfoss's and whuber's solutions in several settings:

 require(microbenchmark)
 set.seed(26565)
 N <- 100
 d <- 10

 X <- matrix(rnorm(N*d), N, d)

 A <- cholMaha( X = X ) 
 A1 <- fastPwMahal(x1 = X, invCovMat = solve(cov(X))) 
 sum(abs(A - A1)) 
 # [1] 5.973666e-12  Ressuring!

   microbenchmark(cholMaha(X),
                  fastPwMahal(x1 = X, invCovMat = solve(cov(X))),
                  mahal(x = X))
Unit: microseconds
expr          min       lq   median       uq      max neval
cholMaha    502.368 508.3750 512.3210 516.8960  542.806   100
fastPwMahal 634.439 640.7235 645.8575 651.3745 1469.112   100
mahal       839.772 850.4580 857.4405 871.0260 1856.032   100

 N <- 10
 d <- 5
 X <- matrix(rnorm(N*d), N, d)

   microbenchmark(cholMaha(X),
                  fastPwMahal(x1 = X, invCovMat = solve(cov(X))),
                  mahal(x = X)
                    )
Unit: microseconds
expr          min       lq    median       uq      max neval
cholMaha    112.235 116.9845 119.114 122.3970  169.924   100
fastPwMahal 195.415 201.5620 205.124 208.3365 1273.486   100
mahal       163.149 169.3650 172.927 175.9650  311.422   100

 N <- 500
 d <- 15
 X <- matrix(rnorm(N*d), N, d)

   microbenchmark(cholMaha(X),
                  fastPwMahal(x1 = X, invCovMat = solve(cov(X))),
                  mahal(x = X)
                    )
Unit: milliseconds
expr          min       lq     median       uq      max neval
cholMaha    14.58551 14.62484 14.74804 14.92414 41.70873   100
fastPwMahal 14.79692 14.91129 14.96545 15.19139 15.84825   100
mahal       12.65825 14.11171 39.43599 40.26598 41.77186   100

 N <- 500
 d <- 5
 X <- matrix(rnorm(N*d), N, d)

   microbenchmark(cholMaha(X),
                  fastPwMahal(x1 = X, invCovMat = solve(cov(X))),
                  mahal(x = X)
                    )
Unit: milliseconds
expr           min        lq      median        uq       max neval
cholMaha     5.007198  5.030110  5.115941  5.257862  6.031427   100
fastPwMahal  5.082696  5.143914  5.245919  5.457050  6.232565   100
mahal        10.312487 12.215657 37.094138 37.986501 40.153222   100

So Cholesky seems to be uniformly faster.

Best Answer

Related Solutions

Solved – Can the covariance matrix in Mahalanobis distance definition be zero

Pairwise Mahalanobis Distances: Calculation and Applications

Related Question