I am trying to cluster Facebook users based on their likes.
I have two problems: First, since there is no dislike in Facebook all I have is having likes (1) for some items but for the rest of the items, the value is unknown and not necessarily zero (corresponding to a dislike). If use 0 for unknowns, then I think my clusters will be biased.
Any suggestion?
Second, supposed I assign 0 to unknown items and cluster them, using a hierarchichal clustering method using a binary measure distance such as Jaccard, Tanimoto,…
How can I evaluate the clustering results? The within and outside SSE is not appropriate for binary data. If I use median centers, I m afraid most of them are going to be zero as I have a sparse feature matrix. So what would be a good way to evaluate the clusters?
Best Answer
Consider using a graph based approach.
Try to find a threshold to define when users are "somewhat similar". It can be quite low. Build a graph of these somewhat similar users.
Then use a Clique detection approach to find groups in this graph.