B"H
Hello,
Assume I have a very large set of vectors ($X_i$) over some feature space ($F_i$), each vector is labeled as either $+1$ or $-1$. For convenience lets refer to this set as "the history set".
THE QUESTION:
Given a new vector $X_{test}$ to be classified (as either "+1" or "-1"), I'd like to find the history set vector which is the closest to the $X_{test}$ vector (Mahalanobis-distance wise) and classify $X_{test}$ as that history-vector.
How can I find the closest history-vector?
Best Answer
Assuming there are some differences between the covariance matrices of the $X_i$ classified as $+1$ and those classified as $-1$, you could do the following:
Calculate the covariances for the two sets of $X_i$. I'll label them $\Sigma_{+}$ and $\Sigma_{-}$.
For all $i$ in the +-set: $d_{test,i} = \sqrt{(x_{test}-x_i)^{\text{T}}\Sigma_{+}^{-1}(x_{test}-x_i)}$. Similarly for the --set, just using $\Sigma_{-}^{-1}$ instead, obviously.
Take the $i$ associated with the minimum $d_{test,i}$ as your closest history-set vector.
The $d_{test,i}$ are the Mahalanobis distances between $X_{test}$ and the $X_i$.
Sample code in R for a single covariance matrix: