I am trying to cluster some big data by using the k-prototypes method. I am unable to use K-Means as I have both categorical and numeric data.
I have been using the package "clustMixType" and have been able to create clusters if I define what k value I want.
I want to find the optimal k value though and can't find anything on this online already.
Solved – Optimal number of clusters using K-Prototypes method in R
clusteringk-meansr
Best Answer
As far as I know there's no generic optimal k.
It depends a lot on your dataset and your goal. A lower K would yield more fuzzy prototypes but would generalize better. There are always trade-offs
One way to pick K is to plot the data, and look at it. Even then you might want to try other values to see if they work better for your application.