I recently read a fascinating article describing methods for clustering data without assuming a fixed number of clusters.
The article even includes some sample code, in a mix of Ruby, Python, and R. However, the meat of the analysis is performed using scikit-learn's Dirichlet Process Gaussian Mixture Model to actually find clusters in some sample data taken from McDonald's menu.
Obviously, this a a great excuse to learn some more python, but I'm lazy and would like to find a ready-made R package that can take a dataframe and return clusters, in a manner similar to the kmeans function. A quick search on CRAN reveals the packages dpmixsim and profdpm. Any suggestions for the best place to start?
Best Answer
I looked at this more carefully, and the package bayesm has rDPGibbs which does "Density Estimation with Dirichlet Process Prior and Normal Base", a kind of Dirichlet clustering. DPpackage has DPdensity which looks similar. I haven't tried these packages myself, so I have no idea how well they work in practice. Details such as parameterization of the normal base and the possibility to set hyperpriors for the DP parameter may be significant.