Solved – Mahalanobis Distance on Singular Data

distancematrixr

I have an issue which I could not solve, although I tried and I got some help on R forum.

I am trying to calculate Mahalanobis distances on a data.frame, where I have several hundreds of groups and several hundreds of variables.
Whatever I do, I get the system is computationally singular: reciprocal condition number error.

It is clear that it is singular, but is there any way to get rid of it and run Mahalanobis? Should I forget solve this using another approach? If yes, then what else to use?

I have uploaded the data file to my FTP: It is a tab delimited txt file with no headers.

I was working with the R StatMatch Mahalanobis (also tried stats Mahalanobis) function. I have a deadline for this project (not a homework!), and I could always use this function, so I thought I will be able to keep the calculations short, but now I am lost.

Migrated from Stack Overflow.

Best Answer

Why do you think there is no way that matrix could be singular?

A QR decomposition shows that the rank of this 380 x 372 matrix is just 300. In other words, it is highly singular:

url <- "http://mkk.szie.hu/dep/talt/lv/CentInpDuplNoHeader.txt"
df <- read.table(file = url, header = FALSE)
m <- as.matrix(df)

dim(m)
# [1] 380 372
qr(m)$rank
# [1] 300

Examining the matrix's singular values is another way to see the same thing:

head(table(svd(df)$d))

# 5.76661502353373e-13 2.57650568058543e-12  0.00929562094651422 
#                   71                    1                    1 
#   0.0277990885015625   0.0398152894712022   0.0469713341003743 
#                    1                    1                    1