Solved – Mahalanobis distance for vector-classification

classificationdistancedistance-functions

B"H

Hello,

Assume I have a very large set of vectors ($X_i$) over some feature space ($F_i$), each vector is labeled as either $+1$ or $-1$. For convenience lets refer to this set as "the history set".

THE QUESTION:

Given a new vector $X_{test}$ to be classified (as either "+1" or "-1"), I'd like to find the history set vector which is the closest to the $X_{test}$ vector (Mahalanobis-distance wise) and classify $X_{test}$ as that history-vector.

How can I find the closest history-vector?

Best Answer

Assuming there are some differences between the covariance matrices of the $X_i$ classified as $+1$ and those classified as $-1$, you could do the following:

  1. Calculate the covariances for the two sets of $X_i$. I'll label them $\Sigma_{+}$ and $\Sigma_{-}$.

  2. For all $i$ in the +-set: $d_{test,i} = \sqrt{(x_{test}-x_i)^{\text{T}}\Sigma_{+}^{-1}(x_{test}-x_i)}$. Similarly for the --set, just using $\Sigma_{-}^{-1}$ instead, obviously.

  3. Take the $i$ associated with the minimum $d_{test,i}$ as your closest history-set vector.

The $d_{test,i}$ are the Mahalanobis distances between $X_{test}$ and the $X_i$.

Sample code in R for a single covariance matrix:

# Create sample history matrix with 100 entries 
X <- matrix(rnorm(1000),10,100)
# Create sample test matrix
Xtest <- rnorm(10)

# Calculate the Mahalanobis distances
Sigma <- cov(t(X))
SInv <- solve(Sigma)

di <- rep(0, ncol(X))
for (i in 1:length(di)) {
  di[i] <- sqrt(t(Xtest-X[,i]) %*% SInv %*% (Xtest-X[,i]))
}

which.min(di)