Solved – Dirichlet process/Chinese restaurant process for clustering in R

clusteringdirichlet-processr

I recently read a fascinating article describing methods for clustering data without assuming a fixed number of clusters.

The article even includes some sample code, in a mix of Ruby, Python, and R. However, the meat of the analysis is performed using scikit-learn's Dirichlet Process Gaussian Mixture Model to actually find clusters in some sample data taken from McDonald's menu.

Obviously, this a a great excuse to learn some more python, but I'm lazy and would like to find a ready-made R package that can take a dataframe and return clusters, in a manner similar to the kmeans function. A quick search on CRAN reveals the packages dpmixsim and profdpm. Any suggestions for the best place to start?

Best Answer

I looked at this more carefully, and the package bayesm has rDPGibbs which does "Density Estimation with Dirichlet Process Prior and Normal Base", a kind of Dirichlet clustering. DPpackage has DPdensity which looks similar. I haven't tried these packages myself, so I have no idea how well they work in practice. Details such as parameterization of the normal base and the possibility to set hyperpriors for the DP parameter may be significant.

Related Question