Solved – Understanding the R stats mahalanobis() function’s Output

An acquaintance recommended I use the Mahalanobis distance on my data instead of Euclidean, Manhattan, etc.

I tried using the mahalanobis() function in the R stats package on a data matrix with N samples and p features, with the p features as rows and N samples as columns.

>> cov_d = cov(t(data_mx))
>> mah = mahalanobis(x = t(data_mx), center = FALSE, cov=cov_d)

When I executed the lines above, I ran into the following issue someone else posted about previously, regarding computationally singular matrix (i.e., the result of using solve() on a singular matrix), as discussed here: https://stackoverflow.com/questions/22134398/mahalonobis-distance-in-r-error-system-is-computationally-singular

When I set tol=1e-25 instead, as is recommended by one user in the post, I only get a vector back, not a patient x patient distance matrix like I expected to get.

>> cov_d = cov(t(data_mx))
>> mah = mahalanobis(x = t(data_mx), center = FALSE, cov=cov_d, tol=1e-25)
>> mah      
 PT001       PT002       PT003       PT001       PT002 
 -3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16
....
PT054         PT059        PT099        PT121        PT154 
-3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16 -3.776784e+16

I'm looking for the mahalanobis distance to give me an N x N matrix back. Will this distance metric not return a matrix, but only a vector? How can you use a vector of distances? How do I know how each patient compares to each other if I don't have pairwise distances, etc?

Solved – Understanding the R stats mahalanobis() function’s Output

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Clustering of points based on vector feature similarities in R

Solved – Problem with estimating probability using the multivariate Gaussian

Avoiding the numeric problem via logarithms

Statistical interpretation of Euclidean distance

Related Question