Solved – R – How to fix NbClust error with error message: “The TSS matrix is indefinite. There must be too many missing values.”

clusteringk-meansr

I would like to know how I can use clustering methods in R (in this case, Kmeans) if I have an "unkind" input matrix (I get this error log:

The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.)

I could see that I might get this error if my matrix produces negative eigenvalues (like, here: https://stackoverflow.com/questions/20669596/nbclust-package-error), but what I'm missing is the "next step" part. I could see a suggestion was to "go back to the Data", but what should I do then? Is there any transformation or something that might help? (I'm pretty new to R and clustering in general…)

The Data I'm using are the result of a survey (which I briefly transformed and scaled via the scale function in R) so I was wondering if there were some algorithms or methods I could use in order to go on with my analysis (from literature I couldn't find great help). Or, if you think this is unfixable or simply non the best solution, do you have any other suggestion for clustering my data? What I'm willing to do is to identify some clusters of possible users/customers of some services, depending on their usual habits (e.g.: if they use many social networks they will be more likely to use chat/whatsapp/app to ask for bank account information – I have both the information of their social network usage and their ways of communicating with a "bank assistant").

The Dataset consists of 994 rows and 103 columns. Don't know if it may help, but the code is simply this:

Data2<- read.csv(...)
bDataScale <- scale(Data2)
nc <- NbClust(bDataScale, min.nc=2, max.nc=993, method="kmeans")

And I get:

Error in NbClust(bDataScale, min.nc = 2, max.nc = 993, method = "kmeans") :
The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.

Thank you in advance for your help or any corrections,

Julia

P.S.: as it would be logical to expect, I get the same error also with the unscaled matrix.

Best Answer

The answer you linked seems to suggest that negative eigenvalues tend to crop up with larger values of max.nc. So maybe try reducing that to something reasonable? I don't know how you'd go about interpreting 993 clusters in any case.