Solved – Clustering of mixed type data with R

clusteringmixed type datar

I wonder whether it is possible to perform within R a clustering of data having mixed data variables. In other words I have a data set containing both numerical and categorical variables within and I'm finding the best way to cluster them. In SPSS I would use two – step cluster. I wonder whether in R can I find a similar techniques. I was told about poLCA package, but I'm not sure …

Best Answer

This may come in late but try klaR (http://cran.r-project.org/web/packages/klaR/index.html)

install.packages("klar")

It uses the non-hierarchical k-modes algorithm, which is based on simple matching as a distance function, so the distance δ between a variable m of two data points $x$ and $y$ is given by

$$ \delta(x_m,y_m) = \begin{cases} 1 & x_m \neq y_m,\\ 0 & \text{otherwise} \end{cases} $$

There is a flaw with the package, that is if two data points have the same distance to a cluster-center, the first in your data is chosen as opposed to a random point, but you can easily modify the bit in the code.

To accommodate for mixed-variable clustering, you will need to go into the code and modify the distance function to identify numeric and non-numeric modes and variables.