Solved – Best clustering algorithm for real estate data

clusteringhierarchical clusteringmodel-based-clustering

I want to cluster real estate data to determine average price patterns in city and rural regions. My data set contains size, number of dorms, bathrooms and coordinates of the properties.

Which would be the best clustering algorithm for this problem?

I’m familiar with k-means, but in this case I don’t think it would be the best approach because I don’t want to pre determine the number of clusters in data.

Best Answer

I would recommend you to try model-based clustering, as implemented in mclust R package. The approach, methods and software are described in the paper "mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation" by Chris Fraley, Adrian E. Raftery, T. Brendan Murphy and Luca Scrucca: http://www.stat.washington.edu/research/reports/2012/tr597.pdf.

For a comprehensive and general (not necessarily model-based) overview of approaches and methods for determining an optimal number of clusters, please see this excellent answer on StackOverflow: https://stackoverflow.com/a/15376462/2872891.

Related Question