Solved – How to re-cluster new instance in centroid base clustering

clusteringk-meansrapidminer

I have applied clustering algorithms like k-mean, k-medoid and DBSCAN on my patients dataset. For each algorithm RapidMiner generated a clustered model (centroid table and graphs etc) and a clustered set (shows which examples are part of which cluster). Now I want some way that when a new patient come i want to assign him a cluster based on previous trained model. I am confused of the way to do this… Is it something like that (I may be wrong)?

  1. For each attribute value of new patient – that attribute value from centroid table summing all the differences of attributes of patient and taking average.

  2. Then assign him cluster whose average is minimum with respect to that patient.

If this the right way then how will I re-cluster, i.e when a new patient comes our algorithm is assigning him cluster, that's mean. centroid moves and then I have to re-cluster with each record insertion. How to handle this in my scenario?

Best Answer

The output from a clustering operator is a model that can be applied to new data using the "Apply Model" operator.

As long as the names of the attributes are the same, it will all work. For the DBScan operator, you may have to have a special attribute with role id to make it work properly.

If you need to rebuild the clusters with new data you would append the new rows to any existing data, rebuild the model and save it for later.

Hope that helps.