An acquaintance recommended I use the Mahalanobis distance on my data instead of Euclidean, Manhattan, etc.
I tried using the mahalanobis() function in the R stats package on a data matrix with N samples and p features, with the p features as rows and N samples as columns.
>> cov_d = cov(t(data_mx))
>> mah = mahalanobis(x = t(data_mx), center = FALSE, cov=cov_d)
When I executed the lines above, I ran into the following issue someone else posted about previously, regarding computationally singular matrix (i.e., the result of using solve() on a singular matrix), as discussed here: https://stackoverflow.com/questions/22134398/mahalonobis-distance-in-r-error-system-is-computationally-singular
When I set tol=1e-25 instead, as is recommended by one user in the post, I only get a vector back, not a patient x patient distance matrix like I expected to get.
>> cov_d = cov(t(data_mx))
>> mah = mahalanobis(x = t(data_mx), center = FALSE, cov=cov_d, tol=1e-25)
>> mah
PT001 PT002 PT003 PT001 PT002
-3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16
....
PT054 PT059 PT099 PT121 PT154
-3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16
I'm looking for the mahalanobis distance to give me an N x N matrix back. Will this distance metric not return a matrix, but only a vector? How can you use a vector of distances? How do I know how each patient compares to each other if I don't have pairwise distances, etc?
Best Answer
It seems you are looking for pairwise.mahalanobis yet you are using mahalanobis.
The mahalanobis distance function returns the distance of each row to the center (= a vector). This is described in the documentation I linked.