I have a dataset consists of 5 features : A, B, C, D, E. They are all numeric values. Instead of doing a density-based clustering, what I want to do is to cluster the data in a decision-tree-like manner.
The approach I mean is something like this:
The algorithm may divide the data into X initial clusters based on feature C, i.e. the X clusters may have small C, medium C, large C and very large C values etc. Next, under each of the X cluster nodes, the algorithm further divide the data into Y clusters based on feature A. The algorithm continues until all the features are used.
The algorithm that I described above is like a decision-tree algorithm. But I need it for unsupervised clustering, instead of supervised classification.
My questions are the following:
- Do such algorithms already exists? What is the correct name for such algorithm
- Is there a R/python package/library which has an implementation of this kind of algorithms?
Best Answer
You may want to consider the following approach:
This will allow you to try different clustering algorithms, but you will get a decision tree approximation for each of them.