Solved – Understanding the R stats mahalanobis() function’s Output

distancemahalanobissimilarities

An acquaintance recommended I use the Mahalanobis distance on my data instead of Euclidean, Manhattan, etc.

I tried using the mahalanobis() function in the R stats package on a data matrix with N samples and p features, with the p features as rows and N samples as columns.

>> cov_d = cov(t(data_mx))
>> mah = mahalanobis(x = t(data_mx), center = FALSE, cov=cov_d)

When I executed the lines above, I ran into the following issue someone else posted about previously, regarding computationally singular matrix (i.e., the result of using solve() on a singular matrix), as discussed here: https://stackoverflow.com/questions/22134398/mahalonobis-distance-in-r-error-system-is-computationally-singular

When I set tol=1e-25 instead, as is recommended by one user in the post, I only get a vector back, not a patient x patient distance matrix like I expected to get.

>> cov_d = cov(t(data_mx))
>> mah = mahalanobis(x = t(data_mx), center = FALSE, cov=cov_d, tol=1e-25)
>> mah      
 PT001       PT002       PT003       PT001       PT002 
 -3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16
....
PT054         PT059        PT099        PT121        PT154 
-3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16

I'm looking for the mahalanobis distance to give me an N x N matrix back. Will this distance metric not return a matrix, but only a vector? How can you use a vector of distances? How do I know how each patient compares to each other if I don't have pairwise distances, etc?

Best Answer

It seems you are looking for pairwise.mahalanobis yet you are using mahalanobis.

The mahalanobis distance function returns the distance of each row to the center (= a vector). This is described in the documentation I linked.