If the M-F distance is asymmetric because the future is different from the past, then a genuine asymmetric clustering is called for. First, an asymmetric distance function must be defined.
One way to to asymmetric clustering, given a distance function, is to embed the original data into a new coordinate space. See "Geometrical Structures of Some Non-Distance Models for Asymmetric MDS" by Naohito Chino and Kenichi Shiraiwa, Behaviormetrika, 1992 (pdf). This is called HCM (the Hermitian Canonical Model).
Find a Hermitian matrix $H$, where
$$
H_{ij} = \frac 1 2 [d(x_i, x_j) + d(x_j, x_i)] + i \frac 1 2 [d(x_i, x_j) - d(x_j, x_i)]
$$
Find the eigenvalues and eigenvectors, then scale each eigenvector by the square root of its corresponding eigenvalue.
This transforms the data into a space of complex numbers. Once the data is embedded, the distance between objects x and y is just x * y, where * is the conjugate transpose. At this point you can run k-means on the complex vectors.
Spectral asymmetric clustering has also been done, see the thesis by Stefan Emilov Atev, "Using Asymmetry in the Spectral Clustering of Trajectories," University of Minnesota, 2011, which gives MATLAB code for a special algorithm.
Couldn't you do a discriminant function analysis of the new groups? and with that you should get a classification rate table, and the % correctly classified via cross-validation should give you an idea of how well the groups are separated.
Best Answer
This is a multivariate Gaussian:
$$ f(x;\mu,\Sigma) = \frac{1}{\sqrt{(2\pi)^{n}|\Sigma|}}e^{(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))} $$
Mahalanobis distance is related to the power of the exponential: $$MD = \sqrt{(x-\mu)^T\Sigma^{-1}(x-\mu)}$$
So I would say if your underlying distributions are multivariate gaussians, Mahalanobis distance seems useful. The major problem is estimating the precision matrix $\Sigma^{-1}$ for cases that are high dimensional with few observations.
If you have no choice but to perform automated outlier detection, then there are some nice interpretable qualities about MD. In the univariate case, this normalized distance is equivalent to the number of standard deviations from the mean. If your data is indeed normal, it's common to call the points that are '$n$' standard deviations from the mean as outliers ($ MD > n$). If you choose $n=2$ as the outlier threshold, you would be rejecting points that exceed the 95 percentile of the underlying distribution in the univariate case (approximately).
In the multivariate case, the curse of dimensionality comes into play. If you wanted to keep the 95 percentile rule, you would need to reject data based on the quantile of a $\chi$ distribution, as explained here.