Solved – How to create clusters with a completely categorical data

categorical datak-meansr

I am working on the project that requires data mining. I have been asked to use R. I have a dataset with all categorical variables and would like to form clusters on that. I am unable to figure out how to do so in R.

Here is what I have done: I have converted all the variables to "factor" data type in R. But I am not able to see the underlying numbered levels. I also do not know how to use this with kmeans() to get the required result.

My question is how do I form clusters on these factors.

Here is what the data looks like:

RowNum|EmpNum|EmpName|EmpOrganization|EmpTitle|EmpLeaderNumber|EmpDepartment|EmpAccesstoApplicaton|EmpAccessID
The entire data is 14MB.

The effort is to cluster people with similar access. So people with similar Title or in similar org might have similar access. I understand kmeans() isn't the best option, but that is what I would like to use for the first draft.

I converted the EmpOrganization, EmpTitle etc to numeric data in excel using simple vlookup. It is easy to convert these to indicator variables using if statement in excel but I'm hoping that there is a more efficient way to do this in R itself.

Best Answer

In R cluster package you can use daisy, this will give you a dissimilarity matrix it works for mixed types also. Then you can use any other clustering function directly.

Related Question